Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regex question (or any other idea on how to do this)

Status
Not open for further replies.

hedgracer

Programmer
Mar 21, 2001
186
US
I receive a file from an outside party that I have to load into an sql server table. Let me stress that I have no control over the formatting of the files content or any correction of the file if there are characters in it that make it impossible to import (need to say that since that seems to be the suggestion in at least two of the responses on these sort of posts). The file has the following line:

TC20090616177 020 177 ?FS?BTY70647447407 47447407 ETZKATT075YR01 06 20090700 00000000003528002009051300000002BA G UDGZUSD0000075000000000000150 ?FS?BTY70647447407 G

The problem character is the ?. I have tried using a regex replace on this character and am getting an illegal escape character error from the C# compiler. All I want to do is replace this character with a ?. Can someone give me some suggestions on this item? Any help is appreciated. Thanks.

Dave
 
The 'Character" represents an unprintable / undefined character for the font. This varies by the actual font, so you will need some additional infomation to be able to filter / replace them.



MichaelRed


 
[1] First you need to determine the exact bytes of the "character". I would suspect it is the left over by the dot net when it does not recognize the character in the original text stream (due to mis-alignment of encoding) and translate it into EF BF BD 3-byte utf-8 encoded character. Once it is arrived at that stage, the orignal info is lost.

[2] Once the character is known, take EF BF BD as example, you can do this.
[tt]
[green]//string s; //s given: it will hold the mentioned string[/green]
string t; //t will hold the replaced string
byte[] abytes={239,191,189}; //supposed being EF BF BD
//or if you can do some sort of copy and paste
//byte[] abytes=Encoding.UTF8.GetBytes("?");
UTF8Encoding encoder = new UTF8Encoding();
string sutf8=encoder.GetString(abytes);
t=Regex.Replace(s,sutf8,"?"); //result
[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top