Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expressions to Eliminate Rich Text Formatting 1

Status
Not open for further replies.

AHJ1

Programmer
Oct 30, 2007
69
US
This is a follow up to another thread,
I am trying to eliminate Rich text formatting. I tried to copy-and-paste to a clipboard, to simulate Notepad's stripping capabilities without success.

Another approach is to use regular expressions to eliminate the extra coding. I'm not sure how to do this in Access.

advises
Code:
Then, construct a regexp like this:

\{[^\}]+\} and replace every match with empty string.

advises, "A co-worker of mine, Scott Jennings, a.k.a. the “RegEx-Man”, came up with the following expression to do the job:"
Code:
Regex.Replace(Regex.Match(rtf, @"\x5cviewkind4[^ ]*(.+)\x5cpar").Groups[1].Value, @"[\n\r\f]|(\x5cpar)|(\x5c[a-zA-Z0-9]+)", "");

I'm not sure how to implement this in Access 2003. Suggestions will be most appreciated.

Alan
 
Hi Alan,

I'll admit that I don't know the ins and outs of RTF formatting. But using the first suggestion quoted above, this will find those matches and replace them with an empty string.

Code:
Function StripRTFCodesFromText(ByVal RTFString As String) As String
 Dim RegEx As Object
 Set RegEx = CreateObject("vbscript.regexp")
 With RegEx
  .Global = True
  .IgnoreCase = True
  .MultiLine = True
  .Pattern = "\{[^\}]+\}"
 End With
 StripRTFCodesFromText = RegEx.Replace(RTFString, "")
 Set RegEx = Nothing
End Function

Matt
 
I can't believe I'm pushing a non regex solution but here goes...

You could just copy the text into a RichTextBox and then use the RichTextBoxes.Text property to return the text without the RTF formatting. Even make it tiny and hidden, so the user will never ever see it.

Cheers

HarleyQuinn
---------------------------------
The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little. - Joe Martin

Get the most out of Tek-Tips, read FAQ222-2244 before posting.
 
>>I can't believe I'm pushing a non regex solution

LOL I hear you. I thought the same thing, but in his previous question he is retrieving "Me!txtcusRTF.Text" and not TextRTF

Alan, for what it's worth, the pattern I use in the function above just simply looks for a { and } and removes those and all text in between them.
Matt
 
Matt, that confused me a bit when I saw that, but .Text doesn't return RTF tags...?

Hmmm [ponder]

HarleyQuinn
---------------------------------
The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little. - Joe Martin

Get the most out of Tek-Tips, read FAQ222-2244 before posting.
 
Matt and Harley, thank you.

Harley, the Rich Text Active X box control is not available on my machine. I registered richtx32.ocx on my machine, but it is not compatible with Access. I guess it is a VB control. Any suggestions?

Matt: Your function works well, but I probably need to adjust the pattern.

I must confess that I don't understand what I am doing. This is the result of running the Regular Expression function:
Code:
}
}
\viewkind4\uc1\pard\keepn\widctlpar\s1\qc\b\f0\fs24 DECISION\par
\pard\widctlpar\b0\f1\par
\pard\widctlpar\fi720\li720\qj\f0\fs22 The Petition of the Assessor is remanded for findings to be developed pursuant to this notice.  The prior action of the County Board in this matter is stayed.\par
\pard\b\i\fs24\par
}

I would like to also strip out everything except "The Petition of the Assessor is remanded for findings to be developed pursuant to this notice. The prior action of the County Board in this matter is stayed." The rest seem to me to be RTF instructions.

I tried to implement Scott Jennings string (shown above) which seems like it will filter out the rest of the instructions, but I'm not sure how to do this.

Many thanks.
 
I know what you mean about the richtx32.ocx in vba, you get that silly access issue. If you have VB you can put that control in a container and register that library, though you may want to google for a "richedit" control from another one of the MVPs (I seem to recall using that at one point). I've heard of other workarounds but never followed through

As for the pattern, try using "^\{(.+)|^\\(.+)|(\}*)" (per
I am going to go look for an RTF spec guide to see if I can write a better one
Matt
 
Hello Matt,
Thanks for your continued help.

I tried the following code, and got the results shown below:
Code:
Function StripRTFCodesFromText(ByVal RTFString As String) As String
 Dim RegEx As Object
 Set RegEx = CreateObject("vbscript.regexp")
 With RegEx
  .Global = True
  .IgnoreCase = True
  .MultiLine = True
  .Pattern = "\x5cviewkind4[^ ]*(.+)\x5cpar"
  '.Pattern = "\x5cviewkind4[^ ]*(.+)\x5cpar"
  '.Pattern = "^\{(.+)|^\\(.+)|(\}*)" ' Removes everything
  '.Pattern = "\{[^\}]+\}" 'Leaves many RTF instructions
 End With
 StripRTFCodesFromText = RegEx.Replace(RTFString, "")
  Debug.Print "Phase 1 - " & StripRTFCodesFromText

 With RegEx
  .Global = True
  .IgnoreCase = True
  .MultiLine = True
  .Pattern = "[\n\r\f]|(\x5cpar)|(\x5c[a-zA-Z0-9]+)"
 End With
 StripRTFCodesFromText = RegEx.Replace(RTFString, "")
 Debug.Print vbCrLf & "Phase 2 - " & StripRTFCodesFromText
 Set RegEx = Nothing
End Function

Results from the debug window:
Code:
Phase 1 - {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fprq2\fcharset0 Arial;}{\f1\froman\fprq2\fcharset0 Times New Roman;}}
{\stylesheet{ Normal;}{\s1 heading 1;}}
{\*\generator Riched20 5.50.99.2010;}
\pard\widctlpar\b0\f1\par
\pard\widctlpar\fi720\li720\qj\f0\fs22 The Petition of the Assessor is remanded for findings to be developed pursuant to this notice.  The prior action of the County Board in this matter is stayed.\par
\pard\b\i\fs24\par
}


Phase 2 - {{{ Arial;}{ Times New Roman;}}{{ Normal;}{ heading 1;}}{\* Riched20 5.50.99.2010;}d DECISIONdd The Petition of the Assessor is remanded for findings to be developed pursuant to this notice.  The prior action of the County Board in this matter is stayed.d}

I will also look for an RTF Spec Guide. I'd like to understand what's happening here.

 
Alan,

Nothing like trying to create a good complex regexp pattern to boggle the mind.

The pattern you're trying to use is actually a .net code and converting the way he did it to VBA will take a little more. I don't really think that will clean all RTF codes, likely just the ones they have generating.

Similar to my outlook message idea, you could always create a word application and use a temporary file. Might take a couple seconds of runtime, but it would work, and well. I'd recommend doing this, I don't think it would take much longer than a solid rtf parser that cleans everything.

Code:
Function StripRTFCodesFromText(ByVal RTFString As String) As String
 Dim vFF As Long, TempFile As String, TextString As String, WordApp As Object
 vFF = FreeFile
 TempFile = "C:\temp file 23985724958243756.rtf"
 Open TempFile For Output As #vFF
 Print #vFF, RTFString;
 Close #vFF
 Set WordApp = CreateObject("word.application")
 With WordApp.Documents.Open(TempFile)
  .SaveAs TempFile, 2 '2=wdFormatText
  .Close False
 End With
 WordApp.Quit False
 Set WordApp = Nothing
 vFF = FreeFile
 Open TempFile For Binary Access Read As #vFF
 TextString = Space(LOF(vFF))
 Get #vFF, , TextString
 Close #vFF
 Kill TempFile
 StripRTFCodesFromText = TextString
End Function

For my own enjoyment I am working on a regular expression to do this, though I am not entirely optimistic. Maybe HarleyQuinn is having better luck.

Matt
 
Hi Matt,

Thanks for this code. It works. There is a minor issue in that sometimes there is a permission denied error that is probably due to the timing of Ole Automation.

Also, while this solution works, it takes a very long time. The result is that the user is likely to be unhappy.

I do believe that using Ole Automation through Word is faster than manipulating Outlook.

Until I figure out how to create the right Regular Expression, I am leaning towards making the user paste the information twice: First into a standard text box, which automatically strips out the RTF and then into the Rich Text Box. I wonder if I can disguise it as feature?

Alan
 
Treat it like a new password validation. You are just doing it to ensure the accuracy of the data. :)

[red]"... isn't sanity really just a one trick pony anyway?! I mean, all you get is one trick, rational thinking, but when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick[/red]
 
Hello,

I just thought I'd let you know that I was able to pull the plain text using this code:
Code:
Me.txtPhraseStripped.Value = Me.SubfrmRTFeditor.Form.Contents(SF_TEXT)

I am impressed with this non-Active X control. Details may be found at should you ever need one.

Before I did this, I changed the user interface as part of a redesign to accommodate 2x pasting as a feature. I like the new interface better, and it is no longer necessary to require 2x pasting as I've worked out the issue.

Matt, I was super impressed with your assistance. I plan to research the Regular Expressions solution later too, and I'll post the solution here after I find it.

In the meantime, please accept a star. Also, as an additional thank you for your efforts, I've made a donation to Tek-Tips. Receipt ID: 7XP01360RL409592E.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top