Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Strip Out Characters From Column Information 2

Status
Not open for further replies.

dwg23

Technical User
Oct 21, 2002
151
US
Hello all,
The results in my columns is text but the text has weird information both in front of it and after it.
I believe it is the formatting and font information.
For example:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}{\f1\fswiss\fprq2\fcharset0 Calibri;}{\f2\fnil Microsoft Sans Serif;}}
{\colortbl ;\red31\green73\blue125;}
\viewkind4\uc1\pard\f0\fs16
Actual information is here
\par
\par
}
Is there a way to strip out all of the extraneous stuff and only leave the actual text?
Thanks,
DWG23
 
This seems like RTF tags, so put this text as is with all the "gibberish" into an RTF text control and you can read it including in it's formatting. Then you can also simply extract the pure text.

AFAIK the pure text is not necessarily in it's own line(s), so you can't do this by parsing out any line without any backslash or curly brackets.

Bye, Olaf.
 
Olaf,
Thanks for the info. what I was hoping for was a way to remove all the stuff as the query executed so it would show correctly in the results.
I gather from your answer that this cannot be done. correct?

Thanks,
DWG23
 
Not in a simple way, at least.

You'd need code parsing RTF to extract just the text in it. Unlike html tags where tags are between < and > and the inner text for sure is between > and <, RTF tags like \par start with \ but have no ending bracket. That makes it harder to remove all RTF code. Especially as I remember the situation of RTF code could also be like "\viewkind4\uc1\pard\f0\fs16Actual information is here\par" in a single line.

You can check out, whther your actual information always is in line 4, that would make it easy again, of course. But just by one example that can't be said, can it?

Bye, Olaf.

 
Olaf,
no can't make that determination with only one example and I do have some entry's that are 4 lines long just in the plain text.
After you told me that the gibberish was RTF, I searched the forum and found this.


I think maybe its not worth all the work.

Thanks,
DWG23
 
Well how many records do you need to convert?
You may query them and put each value into a RTF file, then Word would display the texts.

Bye, Olaf.
 
I have over 1500 records.... to many to do that with. I think I will just let them decipher it on their own.

Thanks,
DWG23
 
with a bit of powershell and msword you can convert those to plain text assuming you have a fully valid rtf string on those fields.

this ps script is reading from a file - you could easily create a further function to read each record from your db, write it to a file and then pass to the code on this ps1

This saves to a file in text format which you could then upload and replace the original text

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
> I think I will just let them decipher it on their own.
All you need is an RTF/Rich text control in your user interface and it'll display RTF correctly.

Bye, Olaf.
 
this ps script better than the one I originally posted

condensed version
$myContent = gc "C:\temp\test.rtf"
$rtBox = New-Object System.Windows.Forms.RichTextBox
$rtBox.Rtf = $myContent

# Get plain text version
$plainText = $rtBox.Text;

# Write the plain text out to the destination file
[System.IO.File]::WriteAllText("C:\temp\test1.txt", $plainText)

and to be dynamic you could change $myContent to be the string from your sql recordset, and then use $plainText directly to update the field on the table directly.

(plenty of examples on the net of how to use sqlclient objects on powershell so not putting any here.)

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Many thanks to both of you for your input.
How can I not give it a shot with all the help you have given.
I will let you know how it goes!
 
The Rich text control mainly has two properties.

In C# you may use the System.Windows.Forms.RichTextBox
Set RTF with your data and Text will have the Text of it

If you do a legacy application there also is an ActiveX Richtext Control that'll have two properties RTFText and Text, otherwise same procedure as every year...

If you do this there is no need to change your data at all.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top