Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL Multi-Language Collation Advice... 1

Status
Not open for further replies.

BenSCooper

Programmer
Sep 28, 2009
15
GB
Dear All,
I'm creating my first multi-language (English\Arabic) SQL Server database (version 2005) and could really do with some advice on the finer points of collation settings.

The structure of the system is:
1 main SQL database (handles storage for the main product)
1 or more additional SQL databases (handles storage for optional bolt on modules)

All of these databases are set to use unicode data types (nvarchar etc) for tables, procedures and functions.

When the product is installed a SQL server instance is automatically installed. The main SQL database is then created.
The optional module databases are created from backup files that will be prepared here ready for distribution.

My concern here is that the SQL instance will be created with a collation matching the Windows regional settings (Arabic in this case), as I believe this is the default SQL behaviour. This in turn will cause the main SQL db to attract the Arabic_CI_AS (or similar) collation when it is created.
As my module databases are restored from backups they will have my default collation of Latin1_General_CI_AS.

I'm concerned that if I'm right and this scenario happens, any cross database views\procedures will fail with a collation conflict error.

My questions are:
1- Is my concern realistic or have I misunderstood the workings of SQL with regard collations?
2- If the concern is valid, what is the best solution to the problem? I thought of forcing the server and main db to use Latin1 collation when installing, but wouldn't this upset any sortation rules\dates for Arabic customers?

Any advice would be very gratefully received. Oh yeah, and sorry for the novel length post :)

Ben Cooper
 
My concern here is that the SQL instance will be created with a collation matching the Windows regional settings (Arabic in this case), as I believe this is the default SQL behaviour.

Each instance of SQL Server has a default collation that is set during the installation procedure. If you are in control of the install, then you are also in control of the default SQL Server instance.

This in turn will cause the main SQL db to attract the Arabic_CI_AS (or similar) collation when it is created.

Actually... no. When you issue a create database command, a new DB is created (obviously), but it is based off the model database. Your newly created database will take on the same collation as the default collation from the model database. Under certain circumstances, utilizing the model database can be an incredible time savings. You see, whatever tables (including data), views, procedures, functions, etc.... that exist in the model database will also exist in your newly created database.

I'm concerned that if I'm right and this scenario happens, any cross database views\procedures will fail with a collation conflict error.

Not all of your cross database views and procedures will fail. Collations are used for sorting and comparing strings, but only strings. If you write cross database views that join on integer columns, you will not have problems. If you sort within your procedures you will not have any problems. If you join on string columns, you can have problems.

For many developers, the biggest headache occurs because of the TempDB. You see, when you use a temp table and/or a table variable, they are created in the TempDB so they take on the default database collation of TempDB.

Basically, you need to check all of your code to see if you are joining on string columns. This includes temp tables and table variables. If you are joining on string columns, then you need to modify the code to specify the collation.

For example....

Code:
Create Table #A(Data VarChar(20) Collate Latin1_General_CI_AS)
Insert Into #A Values('A')
Insert Into #A Values('B')

Create Table #B(Data VarChar(20) Collate Arabic_CI_AS)
Insert Into #B Values('A')
Insert Into #B Values('B')

Select *
From   #A
       Inner Join #B On #A.Data = #B.Data

When you run the code above, you will get the following error: [red]Cannot resolve the collation conflict between "Arabic_CI_AS" and "Latin1_General_CI_AS" in the equal to operation.[/red]

However, you can specify the collation several different ways to get the query to work. For example:

Code:
Select *
From   #A
       Inner Join #B On #A.Data = #B.Data Collate Latin1_General_CI_AS

Select *
From   #A
       Inner Join #B On #A.Data Collate Arabic_CI_AS = #B.Data

Select *
From   #A
       Inner Join #B On #A.Data Collate Database_Default = #B.Data Collate Database_Default

Select *
From   #A
       Inner Join #B On #A.Data Collate Latin1_General_BIN = #B.Data Collate Latin1_General_BIN

Basically, you can collate one to match the other, you can use Database_Default on both of them, or use any collation you want.

One more thing.... you need to be careful about your dates. What is the month for this date? '3/4/2010' If you answered March, you *could* be right. If you answered April you could *also* be right. If you're not sure, how do you expect a computer to decide? Well.... every login has a default language which dictates how strings are interpreted when converting to a date. If you use the ISO format for dates YYYYMMDD hh:mm:ss SQL server will never misinterpret it. Today's date and time would look like this: 20100623 08:18:34

I know you didn't mention dates in your original question, but it is certainly something you need to think about.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks George, that's provided a lot of useful information and clarification. I have a couple more questions as a result if you could indulge me.

If I specify in my installer that all SQL instances and dbs are created with Latin1 collation, will this come with the risk of applying inappropriate string sortation logic? For example I read that in Spanish "ch" should be sorted before "c", whereas in English it would be sorted after of course. If I was using Latin1 collation, which sortation rule would be applied?

With regard your comment about dates I could do with a little more clarification. I am storing all date fields as DateTime data type in sql, which of course stores as YYYYMMDD. If my user's collation is Latin1 then I'm guessing that SQL would expect to be passed dates as DDMMYYYY, which would then translate correctly into my table. However, if the same user passed a date back to SQL in American MMDDYYYY format, this date would be incorrectly stored with the month and day transposed. Is my understanding of this point accurate or have I misinterpreted?

Many thanks again
Ben
 
If I specify in my installer that all SQL instances and dbs are created with Latin1 collation, will this come with the risk of applying inappropriate string sortation logic? For example I read that in Spanish "ch" should be sorted before "c", whereas in English it would be sorted after of course. If I was using Latin1 collation, which sortation rule would be applied?

In my opinion, it would be better to install SQL with your preferred collation. This will likely prevent many potential errors with your code. Of course, you are still left with sorting issues, but it's better to have an incorrect sort order than it is to have errors.

With regard your comment about dates I could do with a little more clarification. I am storing all date fields as DateTime data type in sql, which of course stores as YYYYMMDD. If my user's collation is Latin1 then I'm guessing that SQL would expect to be passed dates as DDMMYYYY, which would then translate correctly into my table. However, if the same user passed a date back to SQL in American MMDDYYYY format, this date would be incorrectly stored with the month and day transposed. Is my understanding of this point accurate or have I misinterpreted?

When you use the DateTime data type, SQL Server actually uses a pair of integers to store the data. It is not stored as YYYYMMDD. Storing dates and times in the DateTime data type is absolutely, without question, the correct thing to do. Reading the data from the database in to your application is usually not a problem. The real problem occurs when you store data (insert or update). I encourage you to read this blog I put together a while ago. It should help clarify the date issue.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks George, that's much clearer. I'm much more confident now that I won't roll out a system that has major language translation issues.

Thanks for all your help, have a well earned star from me!

B
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top