Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

UTF-8 causing trouble :( 2

Status
Not open for further replies.

Sleidia

Technical User
May 4, 2001
1,284
FR
Hi everyone,

I need some help with the following issue :
The accentuated characters of the content taken form the UTF-8 database gets garbled once displayed on my website.

What I don't understand is that everything seems to be properly set :

mysql.ini
Code:
[mysql]
default-character-set = utf8
default-character-set = utf8
character-set-server = utf8
collation-server = utf8_general_ci
init_connect = 'SET collation_connection = utf8_general_ci'
init_connect = 'SET NAMES utf8'

PHP header
Code:
header("Content-Type:text/html; charset=utf-8");

Metas
Code:
<meta http-equiv="content-type" content="text/html;charset=utf-8">

Table structure
Code:
CREATE TABLE `mod_geonames` (
  `geoname_id` mediumint(9) NOT NULL,
  `name` varchar(200) default NULL,
  `latitude` decimal(10,7) default NULL,
  `longitude` decimal(10,7) default NULL,
  `country_code` varchar(2) default NULL,
  PRIMARY KEY  (`geoname_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

So, what else could cause the problem?
I am at a total loss of ideas after extensive googling :(

Thanks for the help! :)
 
Your setup seems right, but the data in the database might be wrong, re-encoded or put in there by non-utf-8 aware programs.

First I want to check if there is really utf-8 in the database:
Code:
SELECT HEX('€');
[tt]E282AC[/tt]

As you see, a euro sign is stored as three bytes (one byte consists of two hexadecimal digits).

If you apply the HEX function to a field with accented or special characters, you should see their length.

Some database front-ends (like Knoda) get the encoding wrong.

Ah. Re-reading your post, I think I see where your problem is. The [mysql] section of the my.ini file is for the mysql command-line only. From PHP, you should send a command:
Code:
SET NAMES utf8;
directly after connecting to the database. As far as I know (and I really spent some time looking) there is no way to set utf-8 as the default character encoding for every connection in the config files.

One final word:
You can drop the meta tag. It is only meant for systems that cannot influence the HTTP-headers. As you are sending the correct HTTP header, the meta tag can only cause page reloading if the client is ill-behaving.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Hi DonQuichote :)

Thank you very much for trying to help.
I appreciate a lot.

I thought that my problem was occuring only on my local appserv setup (on windows vista with PHP5/mySQL5 using sqlYOG), but actually I experience it also on my remote server (PHP5/mySQL5 using phpmyadmin).

So, I did what you recommended first : I get E282AC on the euro symbol. The same euro symbol appears as a question mark inside a losange when looking from my website (but no problem on phpmyadmin).

My website and phpmyadmin are both using UTF-8.
So, I don't understand at all.

I think the problem must come from the database because my site use UTF-8 encoded flat text files for some areas and they are displayed correctly.

Only the data coming from the dabase is broken.

But sorry, I didn't understand the bit about "SET NAMES utf8;"

What should I do exactly?

Thanks again for your precious help.


Thanks again!
 
If you issue a query from PHP:
Code:
SHOW VARIABLES LIKE 'ch%';
you will see that the connection character encoding is latin1. Sorry, but I do not know of any way to change that in a config file. Therefore your first query must be:
Code:
SET NAMES utf8;
That's all, but you have to do it for each connection. So before you send any other query to the database server, send the above SET NAMES command. That will tell the database server that it should read and send everything as utf-8.

+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 

That was it!!!
You've just ended a very painful day.
Thank you very much :)

I've posted on 3 other forums and you were the only one to provide the solution.

Thank you soooooooo much ;)
 
Hi there, unfortunately this does not seem to do it for me. I am using mysql5 and coldfusion. The text in Russian displays properly when hard coded into the webpage. But when I try to do an insert it simply becomes "????", when I type it directly into the database field it changes to "???" as well. when I perform an insert query from my database(MySQL Front) it changes into squares but then at least when I do a select from coldfusion - it show properly - so the insert is still a problem.

when I issue this query statement [SHOW VARIABLES LIKE 'cha%';] i get the following settings:

character_set_client=utf8
character_set_connection=utf8
character_set_database=utf8
character_set_filesystem=utf8
character_set_results=utf8
character_set_server=latin1
character_set_system=utf8

I wonder whether I need to change character_set_server=latin1 to utf8 as well??

Note I perform the SET NAMES 'utf8' statement before my queries

Is there anything am missing please help!!
Thanks
 
First of all, every program has its own current or default encoding: your web server, your database server, your browser, your database front-end, your editor. Without knowing which one you use and what their encoding is set to, I cannot say anything useful.

The text in Russian displays properly when hard coded into the webpage.
Your browser probably has a menu option (view/character encoding in firefox) that shows which encoding is used. This encoding is taken from the Content-Type HTTP header, or from the corresponding META tag. Note that this header and/or meta tag may be incorrect. They must match the actual encoding used. What encoding does the menu show?
when I try to do an insert it simply becomes "????"
With what program do you do the insert and what encoding does that program use? Is the [SET NAMES utf8] also sent before that INSERT statement? What program do you use to view the results? I don't know how long a russian character is in utf-8 encoding, but you might try the SELECT HEX trick:
Code:
SELECT HEX('€');
Replace the euro sign with any russian character you like. If the outcome is one byte (two hexadecimal characters), this is a clear sign that your connection does not use utf-8.
One note about squares and question marks: not every binary string is valid utf-8. If you try to render a non-utf-8 string as utf-8, you will get either question marks or squares or even missing characters for the bytes that fail.
when I type it directly into the database field it changes to "???" as well
Again, which program, which encoding? MySQL is a server and does not render anything! So MySQL itself does not "show".

I do not know coldfusion. Is it a library or an editor?


+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
Hi DonQuichote, thanks for your response.
I use mysql 5 with mysql Front as my front end to the dbase server(i do run my dbase queries here).

I use dreamweaver as my editor and it is currently set to utf8 as the default - I know that cos when i create a blank page i get this line at the html header:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Coldfusion is a server programming language same as php, .net etc.

I have checked firefox and IE, they both use utf8 as their encoding.

The encoding for the dbase encoding is UTF8 but the server might be not[character_set_server=latin1] but I have forced all my fields, database and tables to use utf 8...
I have issued SELECT HEX('?') as you suggested and I got:
D0B4.

I hope this infos helps you to help me.

Thanks
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top