I'm therefore getting odd characters showing up in my web pages.
And it is a long way from your database to the browser. First, let me tell you where it can go wrong. I make a difference between texts (character strings that should be human readable and therefore have an encoding) and strings (just sequences of bytes). The problem is that texts are almost always sent and stored as strings and the encoding travels separate. All systems have their own way of communicating the encoding.
[ul][li]The database field (, table, database. those merely act as default values for the fields).[/li]
[li]the database connection. If you do not send the encoding directly after connecting, the default is latin-1. So if you want to run everything in utf-8, send the command:
SET NAMES utf8; directly after connecting to the database. It is a common beginner's fault to omit this. Alas, this can
not be configured in my.cnf or my.ini. The encoding there is only for the command-line client.[/li]
[li]your website itself. For instance, PHP does not send an encoding by default, so it is latin-1. You can change that in php.ini or send a header yourself (
Content-type: text/html; charset=utf-8)[/li]
[/ul]
"odd characters" can go both ways. If you send utf-8 encoded texts with a marker "this is latin-1", then you get à and ã characters.
If you send latin-1 encoded text marked as "this is utf-8", then you usually get squares or question marks.
But what is in your database? All latin-1 characters are one byte long, but special characters in utf-8 are longer. So if you use the HEX function and count the bytes, you can see what is actually stored.
I think there is a CONVERT USING function in MySQL that allows you to run UPDATE queries to change the
contents of the fields. If you want to change an entire table to utf-8, including its contents and its structure defaults, use
ALTER TABLE ... CONVERT TO utf8;
Good luck!
+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)