Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Read utf-8 file with VFP6

Status
Not open for further replies.

Eliott

Programmer
Nov 8, 2009
91
BA
Hi, I saw here a post id=574513 where is described some routine that is possible to transfer any utf-8 coded text into ANSI, good for VFP6 but it doesn't work for me.
I know how to solve it with VFP9
Code:
twobytstr=STRCONV(c_utf8, 11)	&&convert
but I have some old app I wrote in VFP6 and now desperately need some routine that will read utf-8 coded contents and show it properly in VFP6 as ANSI (Win1250). Also, I tried this code:
Code:
file0= FILETOSTR("utf8code.txt")
*call f
utf8encode(file0)
*
function utf8encode( lcString )
local lcUtf, i
    lcUtf = ""
    for i = 1 to len(lcString) 
        c = asc(substr(lcString,i,1))
        if  c < 128
            lcUtf = lcUtf+chr(c)
        else
            lcUtf = lcUtf+chr(bitor(192,bitrshift(c,6)))+chr(bitor(128,bitand(c,63)))
        endif
    next
? lcUtf
return lcUtf
Any help would be appreciated. Thanks.

There is no good nor evil, just decisions and consequences.
 
Hi Eliott, i use this for UTF8

lcGetfile = GETFILE("TXT","Ouvrir texte UTF-8","Ouvrir",1,"Importation du fichier UTF8")

lcPath = JUSTPATH((lcGetfile))
lcName = JUSTSTEM((lcGetfile))
lcTmpFile = CHRTRAN((lcName),")('#|/-\:*;:%&!$?+ ","") + "_2"

lcLoadFile = STRCONV(FILETOSTR((lcGetfile)),11,1252,1)

CD (lcPath)

STRTOFILE((lcLoadFile), FORCEEXT((lcTmpFile),"txt"))

This gives me an output text file in ANSI
 
@AGlobensky: nice code, but this one isn't for VFP6 but for VFP9.

There is no good nor evil, just decisions and consequences.
 
What is this utf file? Is it HTML or XML? Then you can work on it using automation, eg IE or MSXML.

In general there are win api string conversion functions from anything to anything else and strconv() does only wrap some of those. I don't know from the top of my head, but give news2news a search and you surely will find utf-8 to ansi conversion.

Bye, Olaf.
 
Looked it up. There is MultiByteToWideChar and WideCharToMultibyte:



The first one can convert anything to unicode, the second one unicode to anything.

Heres a snippet converting a utf-8 text containing german umlauts to ansi, intermediately going through unicode.

Code:
DECLARE INTEGER WideCharToMultiByte IN kernel32;
    INTEGER   CodePage,;
    INTEGER   dwFlags,;
    STRING    lpWideCharStr,;
    INTEGER   cchWideChar,;
    STRING  @ lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING    lpDefaultChar,;
    INTEGER   lpUsedDefaultChar
   
DECLARE INTEGER MultiByteToWideChar IN kernel32;
    INTEGER   CodePg,;
    LONG      dwFlags,;
    STRING    lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING  @ lpWideCharStr,;
    INTEGER   cchWideChar

Clear
* lcUTF8 = Strconv("äöü",9)

*---- here goes your initial UTF8 string data:
lcUTF8 = ''+0hc3a4c3b6c3bc
? lcUTF8

lcUnicode = Space(Len(lcUTF8)*4)
lnLenUnicode = MultiByteToWideChar (65001, 0, lcUTF8, Len(lcUTF8), @lcUnicode, Len(lcUnicode))
lcUnicode = Left(lcUnicode,lnLenUnicode*2)
lcAnsi = Space(lnLenUnicode)
lnLenAnsi = WideCharToMultiByte (1252, 0, lcUnicode, Len(lcAnsi), @lcAnsi,8,0,0)
lcAnsi = Left(lcAnsi,lnLenAnsi)
? lcAnsi
*---- here is your ANSI 1252 codepage string.

1250 as the ansi code page also works for me, that is central european instead of western european but also includes german umlauts.

Bye, Olaf.
 
side note: I used the comented strconv("äöü",9) to get the utf-8 string of the umlauts and createbinary() to get the binary representation 0h....

If VFP6 does not understand the 0h syntax you somehow need to put together the bytes c3, a4, etc. via chr(0xc3)+chr(0xa4)+ ...

Bye, Olaf.
 
oups, sorry about that, i have version 9.
is CPCONVERT() available in version 6, if so try this, it worked on my test file

lcGetfile = GETFILE("TXT","Ouvrir texte UTF-8","Ouvrir",1,"Importation du fichier UTF8")

lcPath = JUSTPATH((lcGetfile))
lcName = JUSTSTEM((lcGetfile))
lcTmpFile = CHRTRAN((lcName),")('#|/-\:*;:%&!$?+ ","") + "_2"

lcLoadFile = CPCONVERT(437,1252,FILETOSTR((lcGetfile)))

CD (lcPath)

STRTOFILE((lcLoadFile), FORCEEXT((lcTmpFile),"txt"))
 
Dear fellows, thank you so much for your numerous answers, I'm really happy to have you in such hard situation. :) I have some web app, written by PHP (utf-8 HTML-pages) and MySQL, collation UTF-8 general_ci, when I exported some records as CSV file they appeared as UTF-8 encoded, so by using VFP9 I was able to solve problem but "ancient" VFP6 had hit to wall with its head (and my too). I compared both encoded file in Hexworkshop, tried to avoid problem, made a yard of hard code to jump over problem but couldn't find a some solution. Luckily to me these files are short, 1-10K, most of them are up to 3K, but real tooth-pain. These days I sleep very short as I couldn't find out solution and it mad me nervous... and in same time made me unable to good thinking.
Thank you, I'll take all suggestion and use the best of them.
Again, Thank you!

There is no good nor evil, just decisions and consequences.
 
Last night I tried this code:
Code:
DECLARE INTEGER WideCharToMultiByte IN kernel32;
    INTEGER   CodePage,;
    INTEGER   dwFlags,;
    STRING    lpWideCharStr,;
    INTEGER   cchWideChar,;
    STRING  @ lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING    lpDefaultChar,;
    INTEGER   lpUsedDefaultChar
   
DECLARE INTEGER MultiByteToWideChar IN kernel32;
    INTEGER   CodePg,;
    LONG      dwFlags,;
    STRING    lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING  @ lpWideCharStr,;
    INTEGER   cchWideChar

Clear
set defa to c:\download
lcGetfile = GETFILE("txt","Read UTF-8 file","Open",1,"Reading UTF8")
set defa to e:\processed
lcUTF8=filetostr((lcgetfile))

* lcUTF8 = Strconv("äöü",9)

*---- here goes your initial UTF8 string data:

lcUTF8=lcUTF8+chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d)		&& diacritics: ? ?
lcUTF8=lcUTF8+chr(0xc4)+chr(0x86)+chr(0xc4)+chr(0x87)		&& ? ?
lcUTF8=lcUTF8+chr(0xc4)+chr(0x90)+chr(0xc4)+chr(0x91)		&& ? ?
lcUTF8=lcUTF8+chr(0xc5)+chr(0xa0)+chr(0xc5)+chr(0xa1)		&& Š š
lcUTF8=lcUTF8+chr(0xc5)+chr(0xbd)+chr(0xC5)+chr(0xBE)		&& Ž ž

? 'UTF:',lcUTF8

lcUnicode = Space(Len(lcUTF8)*4)
lnLenUnicode = MultiByteToWideChar (65001, 0, lcUTF8, Len(lcUTF8), @lcUnicode, Len(lcUnicode))
lcUnicode = Left(lcUnicode,lnLenUnicode*2)
lcAnsi = Space(lnLenUnicode)
lnLenAnsi = WideCharToMultiByte (1250, 0, lcUnicode, Len(lcAnsi), @lcAnsi,8,0,0)	&& I need cp1250
lcAnsi = Left(lcAnsi,lnLenAnsi)
? 'ANSI:',lcAnsi
*---- here is your ANSI 1250 codepage string.

and it convert good all diacritics except when in lcUTF8 is together sequence: chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d) combination (C5A1 and C5A1) for letters Š and š. Strange...

There is no good nor evil, just decisions and consequences.
 
What is CPCURRENT() for you, Eliott?

I've checked the ANSI output WideCharToMultiByte in lcAnsi and here is the conversion - you can also do so with CREATEBINARY(lcAnsi):

chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d) && diacritics: ? ?
is converted to
chr(0xc8)+chr(0xe8)

chr(0xc4)+chr(0x86)+chr(0xc4)+chr(0x87) && ? ?
is converted to
chr(0xc6)+chr(0xe6)

chr(0xc4)+chr(0x90)+chr(0xc4)+chr(0x91) && ? ?
is converted to
chr(0xd0)+chr(0xf0)

chr(0xc5)+chr(0xa0)+chr(0xc5)+chr(0xa1) && Š š
is converted to
chr(0x8a)+chr(0x9a)

chr(0xc5)+chr(0xbd)+chr(0xC5)+chr(0xBE) && Ž ž
is converted to
chr(0x8e)+chr(0x9e)

These numbers are correct regarding the code page table at wikipeadia:

The german version of that table is a bit easier to read in my oppinion:
So conversion is fine, you may have a problem with the codepage you're using.

Another thing could be, if you have a BOM (byte order mark) in the first few bytes of your .txt file, which is not at all utf8 and might cause confusion when converted together with the rest of the text file. You need to cut that away before conversion.

Bye, Olaf.
 
Also my sample call was not that general, take care of parameters 4 and 6 each.

This does work better for me:

Code:
lcUTF8=""
*---- here goes your initial UTF8 string data:
lcUTF8=lcUTF8+chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d)        && diacritics: C( c(
lcUTF8=lcUTF8+chr(0xc4)+chr(0x86)+chr(0xc4)+chr(0x87)        && C' c'
lcUTF8=lcUTF8+chr(0xc4)+chr(0x90)+chr(0xc4)+chr(0x91)        && ? ?
lcUTF8=lcUTF8+chr(0xc5)+chr(0xa0)+chr(0xc5)+chr(0xa1)        && Š š
lcUTF8=lcUTF8+chr(0xc5)+chr(0xbd)+chr(0xC5)+chr(0xBE)        && Ž ž
lcUTF8=lcUTF8+chr(0)+chr(0)

lcUnicode = Space(Len(lcUTF8)*4+2)
lnLenUnicode = MultiByteToWideChar (65001, 0, lcUTF8, -1, @lcUnicode, Len(lcUnicode))
lcUnicode = Left(lcUnicode,lnLenUnicode*2)+chr(0)+chr(0)
lcAnsi = Space(lnLenUnicode+1)
lnLenAnsi = WideCharToMultiByte (1250, 0, lcUnicode, -1, @lcAnsi, Len(lcAnsi),0,0)    && I need cp1250
lcAnsi = Left(lcAnsi,lnLenAnsi)
? CreateBinary(lcAnsi)

I'm using -1 as parameter 4, as described in the MSDN function descritions to denote for a null terminated string as input. VFP does nullterminate strings when passing them on via DECLAREd calls, but you can also add double chr(0) to lcUTF8 and lcUnicode as I did, to make double sure. Parameter 6 simply needs to be the byte length of the output variable you initialize with SPACE(). Make sure there is enough space in this version including a final chr(0).

Bye, Olaf.
 
Hi Olaf, and thank you. I'm living in the state where wasn't any course for an version of FoxPro, not mention Visual version. I started with genuine FP for Win 2.6 years ago and learned from books I got in the package... later I tried many Visual, bought Visual studio 6 only for VFP6 (also with books on CD) and later VFP9 due to connection with MySQL and other relational DB. I'm happy how much I discovered by myself because just before few years I got Internet access. So I feel how far I'm behind your & your knowledge and some members here, but I'm not envy, just curious to learn more. I spent a lot of money for genuine package and now I can't quit simply because I can't something to understand, right? Thanks to Lord, I have you and this nice place. Thank you.

There is no good nor evil, just decisions and consequences.
 
Well Eliott,

it's always a pleasure to help. Espicially if it's thanked in such a way. And it's always helpful to see how a given example evolves and works for others to help making it work at all or work better, like you did. Thanks for your collaboration and cooperation. Some others would perhaps never come back again of a solution sample does not work for them.

Bye, Olaf.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top