Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AnsiString to UTF8?

Status
Not open for further replies.

MoonchildHK

Programmer
Jun 5, 2001
22
HK
Does anyone know how I can convert an AnsiString into a UTF8 coded string? The word UTF8 does not even see to be in any of the Help files with Borland C++ Builder 5!!

Any help would be much appreciated!
 
Forgive my ignorance, but what is UTF8? How do you go from a C++ character array to UTF8?

James P. Cottingham

I am the Unknown lead by the Unknowing.
I have done so much with so little
for so long that they think I am now
qualified to do anything with nothing.
 
UTF8 (UCS Transformation Format 8 bit) is a way of representing non-latin characters. In the same way that WideStrings can contain chinese, korean and japanese etc characters, so can UTF8.

UTF8 is used by many programs to transfer these non-latin characters accross networks (in my case, I need to communicate with another chat client that uses UTF8 for non-latin characters).

Unfortunatly, I can't find any simple way to convert from AnsiString, or WideString to UTF8 :-(
 
Do you know how to convert standard character arrays to UTF8? for example:
Code:
char Test[12] = "Hi, there";
...
// How would you convert Test to UTF8?
James P. Cottingham

I am the Unknown lead by the Unknowing.
I have done so much with so little
for so long that they think I am now
qualified to do anything with nothing.
 
Now I may be way off here, but if you can do the same thing with a WideString, why dont you instead of trying to go through all this hassle? Cyprus
 
WideString != UTF8 sadly.

I have solved the problem now by writing my own routine to convert from WideString to UTF8 and vice-versa. UTF8 uses the first few bit's of a byte to identify how many bytes are in that character (as a character may be 1 to 5 bytes long) whereas WideString is always 2 bytes per character. AnsiString is a combination of 1 and 2 byte characters, which is why it can fairly easily be converted (and even cast) to a WideString.
 
Search on "icu" on the internet... possibly add in internationalization. This has methods to handle UTF8 / 16 and it is open source. I belive it was developed by IBM

Matt
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top