LIKE With Ranges Not Working As Expected 3

SgtJarrow · Feb 10, 2016

Anyone explain what is happening in the following code? Even an article somewhere that explains why this is happening?

Basically, I have a list of courses offered at my company. The system allows ANY characters to be inserted into the CourseCode value. Somehow we have been getting some non-standard ASCII characters into our database. Our current CourseCode consists of 2 characters, 4 numbers and an optional X character, all terminated by a semi-colon.

The company wants to allow the optional X character to be any alpha character instead. Seems simple enough as I just need to change my LIKE clause to handle the new convention.

But when I do, I introduce the ability pick up non-standard characters, such as Â. I demonstrate a work around below, but I am trying to understand why the A-Z does not work as expected and determine if I need to go back and adjust some other places to make sure I am using the LIKE properly in all my code...

Just trying to understand why this is happening. Thanks.

Code:

DECLARE @Codes TABLE
(
	CourseCode			VARCHAR(500)
);

INSERT INTO @Codes (CourseCode) VALUES ('RX1010Â  Pharmacy Technician Fundamentals & Computer Applications - Master');
INSERT INTO @Codes (CourseCode) VALUES ('BC1020: Medical Basics and the Healthcare Claim Cycle (1-25-2016) Section 04');
INSERT INTO @Codes (CourseCode) VALUES ('HM2030: Practice Structure and Enhancement (1-25-2016) Section 02');
INSERT INTO @Codes (CourseCode) VALUES ('ME1115X: Introduction to Medical Administrative Assisting (1-04-2016) Section 03');
INSERT INTO @Codes (CourseCode) VALUES ('AC2760Â  Accounting for Managers - TRAINING');
INSERT INTO @Codes (CourseCode) VALUES ('PS1000: Psychology (1-04-2016) Section 01');
INSERT INTO @Codes (CourseCode) VALUES ('PS1000X: Psychology (1-04-2016) Section 02');
INSERT INTO @Codes (CourseCode) VALUES ('PS1000C: Psychology (1-04-2016) Section 03');

--All Courses
SELECT
	co.CourseCode
FROM @Codes co;

--Current Courses Returned
SELECT
	co.CourseCode
FROM @Codes co
WHERE co.CourseCode LIKE '[A-Z][A-Z][0-9][0-9][0-9][0-9][X:]%';

--Proposed Change
SELECT
	co.CourseCode
FROM @Codes co
WHERE co.CourseCode LIKE '[A-Z][A-Z][0-9][0-9][0-9][0-9][A-Z:]%';

--Expected Change
SELECT
	co.CourseCode
FROM @Codes co
WHERE co.CourseCode LIKE '[A-Z][A-Z][0-9][0-9][0-9][0-9][ABCDEFGHIJKLMNOPQRSTUVWXYZ:]%';

=======================================
People think it must be fun to be a super genius, but they don't realize how hard it is to put up with all the idiots in the world. (Calvin from Calvin And Hobbs)

Robert L. Johnson III
CCNA, CCDA, MCSA, CNA, Net+, A+, CHDP
Senior Database Developer

gmmastros · Feb 10, 2016

This is tricky to understand because it deals with collations. Collations are a necessary evil, and understanding what is going on can be quite difficult.

Basically, collations are used when comparing strings. Think of it this way... if you have the letters A through Z, why should A sort before Z? Toss in other characters like lower case letters, letters with accents, and symbols. How should those things sort? The actual sorting (and comparisons) are controlled by the collation you are using.

Each SQL Server instance has a default collation. Each database has a default collation, and each string column has a collation. I always advise that all collations match. If they don't, you will run in to lots of problems.

By the way, you could do this...

Code:

SELECT
	co.CourseCode
FROM @Codes co
WHERE co.CourseCode [!]collate Latin1_GENERAL_BIN [/!]LIKE '[A-Z][A-Z][0-9][0-9][0-9][0-9][A-Z:]%';

This forces a binary collation. With a binary collation, non-standard characters are sorted outside the range of standard characters. I don't know what collation you are using, but those characters must fall within the range of A-Z.

To see how your characters are sorted based on your database default collation, run the following code.

Code:

; With Data As
(
  Select CHAR(number) As Character
  From   master..spt_values 
  Where  Type = 'P'
         And number <= 256
)
Select	Character
From	Data
Order By Character

To see how the data is sorted in a binary collation, run this:

Code:

; With Data As
(
  Select CHAR(number) As Character
  From	 master..spt_values 
  Where	 Type = 'P'
         And number <= 256
)
Select	Character
From	Data
Order By Character collate Latin1_GENERAL_BIN

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

SgtJarrow · Feb 11, 2016

George,

As always, your answer is detailed but not overly so, well described by not complex, and generally just the right answer. I had a suspicion it was related to collation but could not understand why. Before posting, i checked the collation set for the server, database and table in question. They are all set to SQL_Latin1_General_CP1_CI_AS. When I ran your first query, the errant characters are clearly within the A-Z range. The BIN collation changes that to be as expected.

Thank you so much for the explanation.

Now....its off to the boss's office to explain all this and figure out if we are going to change the default collation or if I need to update all my code to ensure the collation used is proper in all cases.

=======================================
People think it must be fun to be a super genius, but they don't realize how hard it is to put up with all the idiots in the world. (Calvin from Calvin And Hobbs)

Robert L. Johnson III
CCNA, CCDA, MCSA, CNA, Net+, A+, CHDP
Senior Database Developer

gmmastros · Feb 11, 2016

if we are going to change the default collation

Careful! There be dragons!

Seriously, changing the default collation for a database is not too difficult, changing the collation for string columns is easy. Changing the default collation for a server is prohibitively difficult.

If you change the collation of your database and columns without changing the collation of your server, you'll have problems with some of your code. Specifically, procedures that use temp table and/or table variables with string and joins will have problems because temp tables and table variables are stored in TempDB (a system database). I think you can change the collation of TempDB, but when the SQL Server service is restarted, this database is deleted and recreated.

The collation you are using is the same as the collation I use. I don't think about collations very often, but will occasionally use a binary collation if I want a case sensitive search. I would encourage you to change your code where need be, it will actually be easier in the long run.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

SgtJarrow · Feb 12, 2016

Thanks George. As my research has led me to believe, I agree that changing to default collation is not necessarily the best choice in my case. We have decided to work with collation on a case by case basis only when needed for specific reasons. Documentation is my next step.

Robert "Wizard" Johnson III
U.S. Military Vets MC
CSM, CSPO, MCPD, CCNA, CCDA, MCSA, CNA, Net+, A+, CHDP
Senior Database Developer

Olaf Doschke · Feb 12, 2016

>Careful! There be dragons!

Indeed experience especially outside of MSSQL (eg MySQL, DBFs) shows you can do more harm than already done, as conversions more often are not reversable.

It helps, if you label sugar as sugar, when it is sugar and was simply labeled as something else before.
It doesn't help, if you label water as wine without letting jesus convert it at the same time.

Changing collations sometimes just means relabelling, sometimes means converting existing string data from old to new collation.

If you have errors showing in the display of text you typically want to relable what you have with the correct collation, so it's displayed correctly. If text display is wrong and you change collation including conversion, you convert the error over into the new collation/encoding, so you keep the error (as far as the new codepages allow same characters).

You can have mixes of several collation, then nothing on this level helps, you really need to address all single display problems. This can happen with data often moved in flat files. As a side story XML parsing sometimes may simply fail on some bte order marks (BOM) notepad adds to an edited flat file.

On the level of the server instance neither strategy of relabelling nor converting typically is valid for all databases, therefore very cautious here, indeed.

Bye, Olaf.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

LIKE With Ranges Not Working As Expected 3

SgtJarrow

Programmer

gmmastros

Programmer

SgtJarrow

Programmer

gmmastros

Programmer

SgtJarrow

Programmer

Olaf Doschke

Programmer

Similar threads

Part and Inventory Search

Sponsor