Hi friends,
Once again, I am stuck with a little problem:
I am trying to create formatted RTFs from UTF-8 encoded text files.
However, as I have to be able to process thousands of files and I wish my app to my performant, I am not using the MS Word engine. Instead, I am writing a VB exe, which will write the an RTF header + RTF-encoded text stream directly to a text file.
My problem now lies with characters off my locale codepage, e.g. Russian.
What I have found out so far, is that Word RTFs do not use Unicode but rather an ANSI encoding based on the respective codepage; so the Russian character with unicode value 41F will be encoded as "\'cf" in the RTF, plus an "ansicpg1251" in the RTF header to specify Russian codepage.
Now that's all fine besides one small detail: how the heck can I find out this "cf" value?
I have tried using the ADO.Stream object to convert from UTF-8 to KOI-8 (Russian), but still i cannot get the correct values!
Is there a way to "look up" the respective value of a certain character with respect to a specific codepage?
I know it does not work this way, but I mean be something comparable to:
![[ponder] [ponder] [ponder]](/data/assets/smilies/ponder.gif)
Thanks a lot for any hint!
Cheers,
Andy
[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell
Once again, I am stuck with a little problem:
I am trying to create formatted RTFs from UTF-8 encoded text files.
However, as I have to be able to process thousands of files and I wish my app to my performant, I am not using the MS Word engine. Instead, I am writing a VB exe, which will write the an RTF header + RTF-encoded text stream directly to a text file.
My problem now lies with characters off my locale codepage, e.g. Russian.
What I have found out so far, is that Word RTFs do not use Unicode but rather an ANSI encoding based on the respective codepage; so the Russian character with unicode value 41F will be encoded as "\'cf" in the RTF, plus an "ansicpg1251" in the RTF header to specify Russian codepage.
Now that's all fine besides one small detail: how the heck can I find out this "cf" value?
I have tried using the ADO.Stream object to convert from UTF-8 to KOI-8 (Russian), but still i cannot get the correct values!
Is there a way to "look up" the respective value of a certain character with respect to a specific codepage?
I know it does not work this way, but I mean be something comparable to:
Code:
myHex=Hex$(ASC$(myChar, codepage:="1251"))
![[ponder] [ponder] [ponder]](/data/assets/smilies/ponder.gif)
Thanks a lot for any hint!
Cheers,
Andy
[navy]"We had to turn off that service to comply with the CDA Bill."[/navy]
- The Bastard Operator From Hell