Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

internetreadfile greek or bulgarian characters

Status
Not open for further replies.

elac

Technical User
Jul 9, 2002
14
0
0
BE
Hello,

I'm using 'internetreadfile' to get the source code of some html pages. The problem is that some of the pages contain bulgarian or greek characters. Internetreadfile returns funny characters and I'd like to convert them back to normal characters.

Any experience on that?

Thanks

elac
 
> Any experience on that?

No.

Now that I've answered your question <g> perhaps you could provide a little more information - 'funny characters' isn't a very useful description.

My reading of the description of InternetReadFile suggests it returns raw html; if so, it may contain a character set declaration, which you may need to interpret the text and you will have to make like a browser and do an awful lot of work. Is it really the best way to get the text?

If I have misunderstood and you actually get display text, exactly what text are you getting, exactly what are you doing with it, exactly what are you seeing, and exactly what do you expect to see?

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
Here is what I should get ...

?????? 91/439/??? ??? ?????????? ??? 29?? ??????? 1991 ??? ??? ????? ????????

and here is what I get ...

Οδηγία αÏ?ιθ. 96/47/ΕΚ του Συμβουλίου της 23ης Ιουλίου 1996 πεÏ?ί Ï„Ï?οποποιήσεως της οδηγίας 91/439/ΕΟΚ για την άδεια οδήγησηÏ

For your information, the html page's encoding is Unicode(UTF-8).
 
I don't have time to work through it at the moment but I suspect you have a UTF-8 sequence of bytes that you are interpreting as UTF-16.

Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
I've had a look at this and I'm quite sure, now, I'm right.

?????? encoded in UTF-8 is (hex) CE9F CEB4 CEB7 CEB3 CEAF CEB1.
Treated as an ANSI string, and converted to Unicode, this becomes (hex) 00CE 009F 00CE 00B4 00CE 00B7 etc, or Οδηγία




Enjoy,
Tony

------------------------------------------------------------------------------------
We want to help you; help us to do it by reading this: Before you ask a question.

I'm working (slowly) on my own website
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top