Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Unicode Text File

Status
Not open for further replies.

timesign

Programmer
May 7, 2002
53
US
Hi,
I would like to read in a Unicode text file and parse it character by character to do some conversions. The file will have Unicode text (such as Hebrew letters),
punctuation (such as periods, commas and numbers 123..), as well as English text.
1) How do I "pop" a character one at a time and
2) find out what it is,(ie the Hebrew letter aleph is Unicode "05d0" how do I compare the letter I pop to find out if it is an Aleph)
Thank you,
 
Here is a little sample code. I figured out notation for unicode charachters, my problem now is that when reading the file for input I am getting garbage, I believe what the reader is doing is reading one BYTE at a time instead of one unicode char at a time.(2 bytes) the result is that the string has apx twice as many char and of course they are all jibrish.

...
BufferedReader BufferedInputFile = new BufferedReader(new FileReader(UserInputFile)); .......
...
for(int i=0; NextLine.length() > i ; i++){
if (NextLine.charAt(i)== '\u05d0' )..........

Thanks,
 
Hi,
I figured out I have to use an InputStreamReader to set flags that tell the code I am reading from UTF8 or UTF16. I still don't know how to do that.
I would apprecaite if anyone knows how can write the line or two of code to open a file and use inputstreamreader.
thanks
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top