Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

UTF-8 Katakana 1

Status
Not open for further replies.

klamerus

Programmer
Jun 23, 2003
71
US
Help.

We have an application that will generate PDF documents from input data. We have been sending this program Latin-1 (which requires turning the upper 128 bit values of a byte into UTF-8), but we now need to send it katakana data in UTF-8 representation.

We've looked around and see the Unicode values going from 30A0 through 30FF, but we understand that UTF-8 Katakana takes 3 bytes, not 2.

Can anyone provide the specific byte values (in order) for a number of katakana characters that are true UTF-8? I think the ideal format would be a snipped from an HTML file so that we can view these, but we really need to understand how many bytes UTF-8 katakana should be and to get some sample values (if not the entire character set) showing both the values and the character. We're going to have to create a replacement table to convert our source file into this.

Thanks,
 
True enough. It's UNICODE that specifies ASCII for 1 byte, but then I believe everything jumps straight to 3 bytes.

Sadly MS still does continue to try and foist UTF-16 on people - and of course, it's their byte order that counts (or Intel's) and no one else's. I wonder if/when they'll start to push UTF-32 :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top