Read utf-8 file with VFP6

Eliott · Jun 22, 2011

Hi, I saw here a post id=574513 where is described some routine that is possible to transfer any utf-8 coded text into ANSI, good for VFP6 but it doesn't work for me.
I know how to solve it with VFP9

Code:

twobytstr=STRCONV(c_utf8, 11)	&&convert

but I have some old app I wrote in VFP6 and now desperately need some routine that will read utf-8 coded contents and show it properly in VFP6 as ANSI (Win1250). Also, I tried this code:

Code:

file0= FILETOSTR("utf8code.txt")
*call f
utf8encode(file0)
*
function utf8encode( lcString )
local lcUtf, i
    lcUtf = ""
    for i = 1 to len(lcString) 
        c = asc(substr(lcString,i,1))
        if  c < 128
            lcUtf = lcUtf+chr(c)
        else
            lcUtf = lcUtf+chr(bitor(192,bitrshift(c,6)))+chr(bitor(128,bitand(c,63)))
        endif
    next
? lcUtf
return lcUtf

Any help would be appreciated. Thanks.

There is no good nor evil, just decisions and consequences.

Andre Globensky · Jun 22, 2011

Hi Eliott, i use this for UTF8

lcGetfile = GETFILE("TXT","Ouvrir texte UTF-8","Ouvrir",1,"Importation du fichier UTF8")

lcPath = JUSTPATH((lcGetfile))
lcName = JUSTSTEM((lcGetfile))
lcTmpFile = CHRTRAN((lcName),")('#|/-\:*;:%&!$?+ ","") + "_2"

lcLoadFile = STRCONV(FILETOSTR((lcGetfile)),11,1252,1)

CD (lcPath)

STRTOFILE((lcLoadFile), FORCEEXT((lcTmpFile),"txt"))

This gives me an output text file in ANSI

Eliott · Jun 22, 2011

@AGlobensky: nice code, but this one isn't for VFP6 but for VFP9.

There is no good nor evil, just decisions and consequences.

Olaf Doschke · Jun 23, 2011

What is this utf file? Is it HTML or XML? Then you can work on it using automation, eg IE or MSXML.

In general there are win api string conversion functions from anything to anything else and strconv() does only wrap some of those. I don't know from the top of my head, but give news2news a search and you surely will find utf-8 to ansi conversion.

Bye, Olaf.

Olaf Doschke · Jun 23, 2011

Looked it up. There is MultiByteToWideChar and WideCharToMultibyte:

http://msdn.microsoft.com/en-us/library/dd374130(VS.85).aspx

http://www.news2news.com/vfp/?group=79&function=557

http://www.news2news.com/vfp/?group=79&function=488

The first one can convert anything to unicode, the second one unicode to anything.

Heres a snippet converting a utf-8 text containing german umlauts to ansi, intermediately going through unicode.

Code:

DECLARE INTEGER WideCharToMultiByte IN kernel32;
    INTEGER   CodePage,;
    INTEGER   dwFlags,;
    STRING    lpWideCharStr,;
    INTEGER   cchWideChar,;
    STRING  @ lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING    lpDefaultChar,;
    INTEGER   lpUsedDefaultChar
   
DECLARE INTEGER MultiByteToWideChar IN kernel32;
    INTEGER   CodePg,;
    LONG      dwFlags,;
    STRING    lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING  @ lpWideCharStr,;
    INTEGER   cchWideChar

Clear
* lcUTF8 = Strconv("äöü",9)

*---- here goes your initial UTF8 string data:
lcUTF8 = ''+0hc3a4c3b6c3bc
? lcUTF8

lcUnicode = Space(Len(lcUTF8)*4)
lnLenUnicode = MultiByteToWideChar (65001, 0, lcUTF8, Len(lcUTF8), @lcUnicode, Len(lcUnicode))
lcUnicode = Left(lcUnicode,lnLenUnicode*2)
lcAnsi = Space(lnLenUnicode)
lnLenAnsi = WideCharToMultiByte (1252, 0, lcUnicode, Len(lcAnsi), @lcAnsi,8,0,0)
lcAnsi = Left(lcAnsi,lnLenAnsi)
? lcAnsi
*---- here is your ANSI 1252 codepage string.

1250 as the ansi code page also works for me, that is central european instead of western european but also includes german umlauts.

Bye, Olaf.

Olaf Doschke · Jun 23, 2011

side note: I used the comented strconv("äöü",9) to get the utf-8 string of the umlauts and createbinary() to get the binary representation 0h....

If VFP6 does not understand the 0h syntax you somehow need to put together the bytes c3, a4, etc. via chr(0xc3)+chr(0xa4)+ ...

Bye, Olaf.

Andre Globensky · Jun 23, 2011

oups, sorry about that, i have version 9.
is CPCONVERT() available in version 6, if so try this, it worked on my test file

lcGetfile = GETFILE("TXT","Ouvrir texte UTF-8","Ouvrir",1,"Importation du fichier UTF8")

lcPath = JUSTPATH((lcGetfile))
lcName = JUSTSTEM((lcGetfile))
lcTmpFile = CHRTRAN((lcName),")('#|/-\:*;:%&!$?+ ","") + "_2"

lcLoadFile = CPCONVERT(437,1252,FILETOSTR((lcGetfile)))

CD (lcPath)

STRTOFILE((lcLoadFile), FORCEEXT((lcTmpFile),"txt"))

Mike Lewis · Jun 23, 2011

is CPCONVERT() available in version 6

Yes. The code-page related functions go back to the early versions of VFP. It's only the double-byte character functions that came in later.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips, training, consultancy

Eliott · Jun 23, 2011

Dear fellows, thank you so much for your numerous answers, I'm really happy to have you in such hard situation.

I have some web app, written by PHP (utf-8 HTML-pages) and MySQL, collation UTF-8 general_ci, when I exported some records as CSV file they appeared as UTF-8 encoded, so by using VFP9 I was able to solve problem but "ancient" VFP6 had hit to wall with its head (and my too). I compared both encoded file in Hexworkshop, tried to avoid problem, made a yard of hard code to jump over problem but couldn't find a some solution. Luckily to me these files are short, 1-10K, most of them are up to 3K, but real tooth-pain. These days I sleep very short as I couldn't find out solution and it mad me nervous... and in same time made me unable to good thinking.
Thank you, I'll take all suggestion and use the best of them.
Again, Thank you!

There is no good nor evil, just decisions and consequences.

Eliott · Jun 24, 2011

Last night I tried this code:

Code:

DECLARE INTEGER WideCharToMultiByte IN kernel32;
    INTEGER   CodePage,;
    INTEGER   dwFlags,;
    STRING    lpWideCharStr,;
    INTEGER   cchWideChar,;
    STRING  @ lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING    lpDefaultChar,;
    INTEGER   lpUsedDefaultChar
   
DECLARE INTEGER MultiByteToWideChar IN kernel32;
    INTEGER   CodePg,;
    LONG      dwFlags,;
    STRING    lpMultiByteStr,;
    INTEGER   cbMultiByte,;
    STRING  @ lpWideCharStr,;
    INTEGER   cchWideChar

Clear
set defa to c:\download
lcGetfile = GETFILE("txt","Read UTF-8 file","Open",1,"Reading UTF8")
set defa to e:\processed
lcUTF8=filetostr((lcgetfile))

* lcUTF8 = Strconv("äöü",9)

*---- here goes your initial UTF8 string data:

lcUTF8=lcUTF8+chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d)		&& diacritics: ? ?
lcUTF8=lcUTF8+chr(0xc4)+chr(0x86)+chr(0xc4)+chr(0x87)		&& ? ?
lcUTF8=lcUTF8+chr(0xc4)+chr(0x90)+chr(0xc4)+chr(0x91)		&& ? ?
lcUTF8=lcUTF8+chr(0xc5)+chr(0xa0)+chr(0xc5)+chr(0xa1)		&& Š š
lcUTF8=lcUTF8+chr(0xc5)+chr(0xbd)+chr(0xC5)+chr(0xBE)		&& Ž ž

? 'UTF:',lcUTF8

lcUnicode = Space(Len(lcUTF8)*4)
lnLenUnicode = MultiByteToWideChar (65001, 0, lcUTF8, Len(lcUTF8), @lcUnicode, Len(lcUnicode))
lcUnicode = Left(lcUnicode,lnLenUnicode*2)
lcAnsi = Space(lnLenUnicode)
lnLenAnsi = WideCharToMultiByte (1250, 0, lcUnicode, Len(lcAnsi), @lcAnsi,8,0,0)	&& I need cp1250
lcAnsi = Left(lcAnsi,lnLenAnsi)
? 'ANSI:',lcAnsi
*---- here is your ANSI 1250 codepage string.

and it convert good all diacritics except when in lcUTF8 is together sequence: chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d) combination (C5A1 and C5A1) for letters Š and š. Strange...

There is no good nor evil, just decisions and consequences.

Olaf Doschke · Jun 24, 2011

What is CPCURRENT() for you, Eliott?

I've checked the ANSI output WideCharToMultiByte in lcAnsi and here is the conversion - you can also do so with CREATEBINARY(lcAnsi):

chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d) && diacritics: ? ?
is converted to
chr(0xc8)+chr(0xe8)

chr(0xc4)+chr(0x86)+chr(0xc4)+chr(0x87) && ? ?
is converted to
chr(0xc6)+chr(0xe6)

chr(0xc4)+chr(0x90)+chr(0xc4)+chr(0x91) && ? ?
is converted to
chr(0xd0)+chr(0xf0)

chr(0xc5)+chr(0xa0)+chr(0xc5)+chr(0xa1) && Š š
is converted to
chr(0x8a)+chr(0x9a)

chr(0xc5)+chr(0xbd)+chr(0xC5)+chr(0xBE) && Ž ž
is converted to
chr(0x8e)+chr(0x9e)

These numbers are correct regarding the code page table at wikipeadia:

http://en.wikipedia.org/wiki/Windows-1250

The german version of that table is a bit easier to read in my oppinion:

http://de.wikipedia.org/wiki/Windows-1250

So conversion is fine, you may have a problem with the codepage you're using.

Another thing could be, if you have a BOM (byte order mark) in the first few bytes of your .txt file, which is not at all utf8 and might cause confusion when converted together with the rest of the text file. You need to cut that away before conversion.

Bye, Olaf.

Olaf Doschke · Jun 24, 2011

Also my sample call was not that general, take care of parameters 4 and 6 each.

This does work better for me:

Code:

lcUTF8=""
*---- here goes your initial UTF8 string data:
lcUTF8=lcUTF8+chr(0xc4)+chr(0x8c)+chr(0xc4)+chr(0x8d)        && diacritics: C( c(
lcUTF8=lcUTF8+chr(0xc4)+chr(0x86)+chr(0xc4)+chr(0x87)        && C' c'
lcUTF8=lcUTF8+chr(0xc4)+chr(0x90)+chr(0xc4)+chr(0x91)        && ? ?
lcUTF8=lcUTF8+chr(0xc5)+chr(0xa0)+chr(0xc5)+chr(0xa1)        && Š š
lcUTF8=lcUTF8+chr(0xc5)+chr(0xbd)+chr(0xC5)+chr(0xBE)        && Ž ž
lcUTF8=lcUTF8+chr(0)+chr(0)

lcUnicode = Space(Len(lcUTF8)*4+2)
lnLenUnicode = MultiByteToWideChar (65001, 0, lcUTF8, -1, @lcUnicode, Len(lcUnicode))
lcUnicode = Left(lcUnicode,lnLenUnicode*2)+chr(0)+chr(0)
lcAnsi = Space(lnLenUnicode+1)
lnLenAnsi = WideCharToMultiByte (1250, 0, lcUnicode, -1, @lcAnsi, Len(lcAnsi),0,0)    && I need cp1250
lcAnsi = Left(lcAnsi,lnLenAnsi)
? CreateBinary(lcAnsi)

I'm using -1 as parameter 4, as described in the MSDN function descritions to denote for a null terminated string as input. VFP does nullterminate strings when passing them on via DECLAREd calls, but you can also add double chr(0) to lcUTF8 and lcUnicode as I did, to make double sure. Parameter 6 simply needs to be the byte length of the output variable you initialize with SPACE(). Make sure there is enough space in this version including a final chr(0).

Bye, Olaf.

Eliott · Jun 25, 2011

Hi Olaf, and thank you. I'm living in the state where wasn't any course for an version of FoxPro, not mention Visual version. I started with genuine FP for Win 2.6 years ago and learned from books I got in the package... later I tried many Visual, bought Visual studio 6 only for VFP6 (also with books on CD) and later VFP9 due to connection with MySQL and other relational DB. I'm happy how much I discovered by myself because just before few years I got Internet access. So I feel how far I'm behind your & your knowledge and some members here, but I'm not envy, just curious to learn more. I spent a lot of money for genuine package and now I can't quit simply because I can't something to understand, right? Thanks to Lord, I have you and this nice place. Thank you.

There is no good nor evil, just decisions and consequences.

Olaf Doschke · Jun 25, 2011

Well Eliott,

it's always a pleasure to help. Espicially if it's thanked in such a way. And it's always helpful to see how a given example evolves and works for others to help making it work at all or work better, like you did. Thanks for your collaboration and cooperation. Some others would perhaps never come back again of a solution sample does not work for them.

Bye, Olaf.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Read utf-8 file with VFP6

Eliott

Programmer

Andre Globensky

Programmer

Eliott

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Andre Globensky

Programmer

Mike Lewis

Programmer

Eliott

Programmer

Eliott

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Eliott

Programmer

Olaf Doschke

Programmer

Similar threads

Part and Inventory Search

Sponsor