Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing acute accents

Status
Not open for further replies.

jojo11

Programmer
Feb 2, 2003
189
US
I have a parser application written in .NET which we load things into a DOM object. The header shows the files that we receive are encoded in UTF-8. Whenever it encounters an accented character, (é, ç â è ñ ü) the parser blows up every time.
How can we resolove this?

-------------------------------------------
Ummm, we have a bit of a problem here....
 
You probably have ANSI-encoded data: you can't just call it UTF-8 and expect it to magically be UNICODE.

Try this experiment: create a new document in Notepad, copy your data into it, and save the document changing the encoding to UTF-8 (bottom drop-down list in the Save as... dialog.) Then run it through your parser.

Works now?
 
Inspect the file in a hex editor (I like If those characters are represented by 2 or more bytes, then you most likely have UTF-8 encoded file. If they're represented as single bytes, then you have a single-byte (code-page) encoded file. If you have a code-page file, you will need to contact the sender to find which code page they used to write the file with (those characters are shared by several code pages, not just a single one).

It would also be helpful if you posted the contents of the XmlException you got as a result of calling LoadXml (assuming you're using an XmlDocument). Don't forget to check the InnerException property, in case it isn't null.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top