Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

VB6 and UTF-8 1

Status
Not open for further replies.

harebrain

MIS
Feb 27, 2003
1,146
US
It seems as though we flogged this subject to death years ago, but I'm hesitant to rely on the ten-year-old threads I found that are, at best, ambiguous. So here we go again:

Is there a straightforward way to handle UTF-8 files with VB6?

I think my challenge in the end will be to read UTF-8 files and convert the data to ASCII for database storage.

Thanks in advance,
David
 
>Is there a straightforward way to handle UTF-8 files with VB6?

Yep. Leveraging ADO streams is one way

>convert the data to ASCII

Ah ... see, the first 127 characters in UTF8 are exactly the same as the 127 ASCII characters (which is all ASCII has). So conversion is pretty easy, as long as it is one of the first 127 UTF8 characters. If it isn't, well there's no conversion possible. But I suspect you don't actually mean ASCII ...
 
Thanks, Mike, that confirms what I already know and clarifies a bit of the mystery. (No pun intended.)

Our legacy system is happy, old, "PC ASCII": numerals, upper- and lowercase letters, some punctuation. Now we have a new instrument that is producing UTF-8 files. It leads off files with a signature (already a problem) but most of the text, by far, is going to be the same as ever, as you mentioned. The concern here is that we have European users (and US Latinos) who might introduce accented characters, tilde-n, etc. When that happens, I'm assuming it will originate in their localization of the Windows OS. Now that I think about it, that's probably a code-page issue rather than Unicode, which probably makes a bigger mess than anticipated.

So, for the simple "let's put a UTF-8 to ASCII filter in front of our system" solution, I think we're screwed. I'm meeting with the instrument's engineers tomorrow; maybe I'll learn something.

 
Whether you use ADO Stream objects or API calls to convert your UTF-8 to UTF-16LE (as VB6 String values are normally encoded) the locale and codepage don't matter.

There is no fidelity loss internally in your VB6 programs. ANSI only comes in when you use controls and operations that imply conversion or where you do explicit conversion. ASCII seldom comes into play at all unless you take special actions.

If it is just a question of some Elbonian user entering Elbonian characters and needing to get them back (in TextBoxes, local files, etc.) then the usual automagic ANSI conversion using the current codepage should work fine for the characters they'd normally flirt with.

It's only when passing data between locales (FTP, email attachments, mailed CDs, etc.) that you get into trouble. Then you want to use some Unicode encoding for your data and avoid storing ANSI into files and databases or transmitting it.

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) is a classic must-read for VB programmers.
 
The funniest part of that article is his claim that EBCDIC isn't relevant. Of course it is when you need to deal with EBCDIC. About half of my VB6 contracts involve comunicating with EBCDIC mainframes - which do support Unicode (but their Cobol programmers don't).

Luckily the same tools (ADO Streams, Wide/MultiByte API calls) put EBCDIC into easy reach of VB programmers too.
 
Thanks, I'm going to remember this thread.

I had seen the Spolsky article and was also amused by the deprecation of EBCDIC: the job I had before this one--we used PL/1. Gag.

For now we're going to survive on faith: we don't think we're going to see any > 127 data, so we're going to pretend it won't happen until it does. That ought to last a week or two.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top