Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Understanding charater sets

Status
Not open for further replies.

jez

Programmer
Apr 24, 2001
370
VN
Hi all,

i have a problem with UK Pounds signs £ (not sure if that will show or not).
In my web page which has a charset of ISO 8859-1, i can see pound signs (even in the plain text source code of the page). These are being passed in from an MS Sql database.

When submitting the form to its target MySql database, the pound signs are converted to having a capital A with a carat on top and also a pound sign and that is how they are stored in the DB.


The table has a charset of latin1 and the collation is latin1_swedish_ci

I am not sure where this is getting switched but its somewhere in the save process.
Can anyone give me some pointers about where or what i can do about this.

Thanks.
 
What you are seeing is utf-8, but rendered as latin-1.

These are being passed in from an MS Sql database

What do you mean by that? Was there some kind of conversion from MS-SQL to MySQL?

There is also something of interest: the encoding of the database connection. The default is (unless you managed to configure it differently) latin1, or iso-8859-1. Is there any difference if you send a "SET NAMES latin1;" command directly after opening a connection?

Some tests:
When you use mysqldump on a table with special characters, how do they show up?

Try:
Code:
SELECT HEX('£');
SELECT _latin1 0xC2A3;
SELECT _utf8 0xC2A3;
In your favourite database frontend program. The first statement renders C2A3 on my system, because I use utf-8. The second and third queries both select the exact same string, but with a different "pushed" encoding. One of them should show up a pound sign (hopefully the last one).

Last but not least: what version of MySQL are you using?


+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Thanks for the pointers, i'll give them a try, as for the the MS side of things, all i meant was that a query on an MS database puts actual pound signs in the plain text of the html page that is then submitted. Its not that relevant as they are pound signs in plain text when i start dealing with them.

Thanks again,
 
I still do not understand what you mean. There is no such thing as plain text. Every text has an encoding. When you see a pound sign, that says nothing about how it is stored and what encoding is told about it. That is why I gave the HEX() example. With latin1-encoding, it would return one byte. But for utf-8, it returns two bytes as you see. But both situations (one-byte and two-byte) render exactly to the same glyph on screen IF YOU SUPPLY THE RIGHT ENCODING. And that is where it goes wrong, as you were seeing an utf-8 encoded string that was wrongly told to be latin1-encoded.

What do you mean by "submitted"? Is there some kind of web form? In that case, please note that no browser I know of (and I tested it with IE, Firefox and Safari) sends the used encoding upon post. That sounds cryptic, but let me explain. If a browser renders a web form in the encoding it gets from the web server, it will use that same encoding to post the data. But if you choose another encoding in your browser, that encoding is also used to submit the data. In both cases, NO INFORMATION is sent as to which encoding was actually used.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top