Problem with Eastern European languages in reports 2

mibosoft · Oct 14, 2019

Hi,
I have a VFP9 web application where I want to add support for the Czech language. It all works fine except for the reports. If Czech words are entered via the web interface, they are stored in the table like this:

When I list these words again in the web browser, it looks perfectly fine, like this:

The problem is in the reports where the contents of the table is listed except for a couple of letters like for example "Š" that are actually shown correctly:

My current codepage is 1252 and I have tried other codepages, STRCONV(), etc, but I can never get VFP to show the original texts again. Is information lost once the words are written to the table? If so, how come that the web browser is able to show it correctly? Any hints how to get the reports to show the original texts?

BR,
Micael

mibosoft · Oct 18, 2019

Thank you Atlopes. It works when I check "use font script" for a specific field in a report. Do I have to go through every report and every field or is there a any way to set this globally/default for all of my reports and forms?

Olaf Doschke · Oct 18, 2019

The default always is the current codepage. What does that have to do with FontCharset? This:

[pre]Codepage of operating system assigned Character Set
1250 (Central Europe) EASTEUROPE_CHARSET (238)
1252 (Latin I) DEFAULT_CHARSET (1)
1251 (Cyrillic) RUSSIAN_CHARSET (204)
1253 (Greek) GREEK_CHARSET (161)
1254 (Turkish) TURKISH_CHARSET (162)
1257 (Baltic) BALTIC_CHARSET (186)
1258 (Vietnam) VIETNAMESE_CHARSET (163)
874 (Thai) THAI_CHARSET (222)
932 (Japanese Shitf-JIS) SHIFTJIS_CHARSET (128)
936 (Simplified Chinese) GB2312_CHARSET (134)
950 (Traditional Chinese Big5) CHINESEBIG5_CHARSET (136)
949 (Korean) HANGEUL_CHARSET (129))[/pre]

Which I found on

https://www.softvision.de/fileadmin...es/products/n2pdf/help/client_enu/unicode.htm

So the major way VFP works without FontCharset most of the time is that when you only need the codepage your OS uses by default for ANSI programs. FontChrset enables to break out of that, as long as the font used (like MS Arial) supports several codepages.

So if you don't want to go through all reports for the different languages and encodings you don't set Fontcharset, but look for a way to start a VFP session in the codepage you need for a user, instead of overriding that with FontCharSet.

If you want a report to dynamically switch the only way is to not set FontCharset or you need to hack into the FRX at runtime every time.

You will be redoing what the INTL Toolkit has already done. A report with predefined label captions will not change to other languages fully anyway.

The simplest question perhaps is: Do you really need one instance of your EXE running and using two languages needing two codepages? Otherwise, split your data up into the different codepages, start the app in the codepage necessary and you don't need to fiddle with FontCharsets, you only need to transition between UTF8 and the current codepage.

If you want to support the locale of Windows users, the simplest way is to only react to CPCURRENT(). In some codepages, it becomes equivalent to a very specific language, for example in greek and in some codepages like 1252 you could support multiple languages. But if your app should work in one specific language anyway, there would be no need to make use of that special feature.

The only transition of worlds you have is between desktop and web.

That would mean dedicated DBFs and reports for each language. And that would mean no master report you can switch to another language with just a langID switch, but for the purpose of developing a single report only, you could create the different language reports in a build step instead of maintaining all of them.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

atlopes · Oct 18, 2019

Mibosoft,

Unfortunately, as far as I know, there is no easy way to support simultaneous different ANSI code pages in the same report. The built-in dynamics features lack the character set support. You may try to multiply the text controls in your report by the supported code pages and set appropriate Print When conditions based on the value of the language field of your table or cursor.

mibosoft · Oct 21, 2019

I think I will manage now by using a call to HTMLNumEntityToANSI() for (selected) strings in reports. The rest of the application is web based and here it works fine with strings with HTML entities.

Thank you both Olaf and atlopes for great help!

Olaf Doschke · Oct 21, 2019

Just a last note: I tried what the effect of config.fpw CODEPAGE=n is. You get CPCURRENT() being the codepage defined in the config, eg one for eastern european languages, but _SCREEN.FontCharSet, for example, is still 1. The latter, I think, will only change when the OS also is configured for an eastern european language.

So overall, this just seems like something you can't test without really also switching the OS or actively setting the FontCharSet property of things. If you develop on a Czech Windows I guess the _SCREEN.FontCharset and the value for controls will default to 238.

I can, of course, use atlopes or INTL or West Windows codes to display many languages, but it could be simply also defaults that you only really need to set when you want to support multiple languages.

The only hurdle is, once you design your report in a Western European Windows version this will stay with FontCharset=1, this doesn't change like Theme related colors do change with the OS settings.

The HTMLEntities you still have and now want to handle with the conversion function, well, they do come from the web browser and when exactly the come in and whether that already happens at the sending side of the browser or at transport, I don't know, we don't know your web code, that would need changes to create a UTF8 page in the first place and also let the HTML form send back UTF8. You showed us the console screenshot output of document.characterset is "Window-1252". Nothing changes, as long as that stays 1252.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Oct 21, 2019

I played a bit with foxisapi and it boils down to:

1. When you manage to get the document.characterSet to be "UTF-8" form data is actually not converted to HTML entities, you get UTF-8

2. When you set no enctype in your form tag it defaults to urlencoded, which means the charset used is - more or less - ASCII only. Any characters that fall in the category of special characters come over in the URL encoding form of a percent followed by hexadecimal. For example, the Cyrillic Д comes back as %D0%94. That's quite similar to HTML entities, just a bit shorter because of being hexadecimal, still a conversion you need to invert.

3. If you set enctype="multipart/form-data" you get, well, multipart form data. That looks more complicated than the usual post body but is actually easy to parse, as each form parameter comes in its own section and separate lines, ie you can use ALINES in VFP, for example. And the actual value is unchanged, which means UTF-8.

And then all you need to do is STRCONV of that parsed-out lines of parameters to the current or perhaps better to the desired codepage (UTF-8 to DBCS).

In the end, what bytes you get not only depends on the document charset, which enforces to change some of the inputs to HTML entities, but also on the URL-encoding, HTML forms do by default. Without any efforts of conversions alone the urlencoding limits text to be English.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Oct 21, 2019

Finally, I changed the advanced region option to use Czech for non-Unicode software to see whther that changes FontCharset as I'd assumed.

It doesn't do what I expected it to do.

When I start VFp9 normal (without a config.fpw) the cpcurrent() is still 1252. I guess that's deeper in the system or VFP installation. Also _screen.FontCharset inits as 1.
And changing codepage in config.fpw also only changes cpcurrent(), not the FontCharSet. I may expect too much of this.

But there is a very fortunate effect, that may come with this option is, that VFP now has become UTF-8 capable to me!!

OK, that's German, the important part is that there now is a Win10 beta feature that seems to promise UTF-8 support for ANSI applications (like VFP). You get there in the Region settings:

UTF-8 doesn't come in via clipboard from Unicode capable applications, but once you have it in a DBF you can copy it within VFP from browse to sourcecode and print it, too:

To get there Ipasted 漢字編碼方法, Кириллица, and čeština into a notepad++ editor and saved that as a file, which I replaced into a char field via FILETOSTR() without any further STRCONV().

So that means you may be able to work fully in UTF8, but only on Win10 with that beta feature.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Problem with Eastern European languages in reports 2

mibosoft

Programmer

mibosoft

Programmer

Olaf Doschke

Programmer

atlopes

Programmer

mibosoft

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Similar threads

Part and Inventory Search

Sponsor