Unicode Text File

timesign · May 11, 2002

Hi,
I would like to read in a Unicode text file and parse it character by character to do some conversions. The file will have Unicode text (such as Hebrew letters),
punctuation (such as periods, commas and numbers 123..), as well as English text.
1) How do I "pop" a character one at a time and
2) find out what it is,(ie the Hebrew letter aleph is Unicode "05d0" how do I compare the letter I pop to find out if it is an Aleph)
Thank you,

timesign · May 12, 2002

Here is a little sample code. I figured out notation for unicode charachters, my problem now is that when reading the file for input I am getting garbage, I believe what the reader is doing is reading one BYTE at a time instead of one unicode char at a time.(2 bytes) the result is that the string has apx twice as many char and of course they are all jibrish.

...
BufferedReader BufferedInputFile = new BufferedReader(new FileReader(UserInputFile)); .......
...
for(int i=0; NextLine.length() > i ; i++){
if (NextLine.charAt(i)== '\u05d0' )..........

Thanks,

timesign · May 13, 2002

Hi,
I figured out I have to use an InputStreamReader to set flags that tell the code I am reading from UTF8 or UTF16. I still don't know how to do that.
I would apprecaite if anyone knows how can write the line or two of code to open a file and use inputstreamreader.
thanks

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Unicode Text File

timesign

Programmer

timesign

Programmer

timesign

Programmer

Similar threads

Part and Inventory Search

Sponsor