Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

unique index on string column containing german umlaut 1

Status
Not open for further replies.

kittyyo

Programmer
Oct 17, 2001
89
AT
Is this a bug or a feature:

I first create this simple table.

CREATE TABLE Test
( ID INT UNSIGNED NOT NULL AUTO_INCREMENT,
Name VARCHAR(10) NOT NULL,

PRIMARY KEY (ID),
UNIQUE INDEX (Name)
);

Then I try to add the following two records:

INSERT INTO Test (Name) VALUES ("Muller");
INSERT INTO Test (Name) VALUES ("Müller");

"Muller" and "Müller" are not the same,
still I get the error
"Duplicate entry 'Müller' for key 2".


I'm using MySQL Server 4.0.18-nt (Windows 2000).

Should I send a bug report?
Thanks,

Anne

 
I should maybe state that I already searched the MySQL bug database but did not find anything like this.
 
It looks like a character set issue; MySQL probably converts "unfamiliar" characters to their nearest equivalent in the field's character set.

When you are creating a table, you can set the character set for a field as follows:
[tt][bold]CREATE TABLE t
(
name VARCHAR(10) NOT NULL CHARACTER SET charset_name,
other_fields
);[/bold][/tt]

You can find out the default character set of a database by:
[tt][bold]SHOW VARIABLES LIKE 'character_set';[/bold][/tt]

You can change the default character set of a database by:
[tt][bold]ALTER DATABASE dbname DEFAULT CHARACTER SET charset_name;[/bold][/tt]

I ran your statements on my own database, which uses the character set "latin1", and they ran OK; there are two separate records for "Muller" and "Müller".
 
Thanks for your answer, it is indeed a character set issue.

The problem I have, though, is that (if not using 'german1' or 'german2') the sort order is "wrong":

Right would be

A -> U -> Ü -> Z (o.k. with german)

but latin1 sorts

A -> U -> Z -> Ü.

Is it possible to have the right sort order without having u=ü, U=Ü and so on?
 
First, some corrections and clarifications:
You can find out the default character set of the server by:
[tt]SHOW VARIABLES LIKE 'character_set';[/tt]
You can find out the character set of a database (if it's not the server's default) by:
[tt]SHOW CREATE DATABASE db_name;[/tt]


I notice that in 'latin1', 'Müller' and 'Myller' are regarded as duplicates. Probably the best solution to this would be to declare the field as BINARY; no conversions will be done, but it will make the field use case-sensitive comparisions. For example:
[tt]name VARCHAR(10) BINARY NOT NULL,[/tt]

I'd better check my own database to see that it handles "fadas" (Irish acute accents) properly.


To solve the sorting issue:
According to P.DuBois's book "MySQL", MySQL 4.1 allows you to sort query results using a specified character set, as follows:
[tt]ORDER BY name COLLATE german1[/tt]
I'm using MySQL 4.0, so I can't check that.

I hope that goes some way to solving your problem.
 
Thanks a lot, I think I'll wait for our project to proceed to MySQL 4.1 then :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top