SQL Server 2000 offers a new function called CHECKSUM. The main purpose for this function is to create what are called hash indices. A hash indices is an index built on a column that stores the checksum of the data found in another column in the table. The CHECKSUM function takes data from another column and creates a checksum value. In other words, the CHECKSUM function is used to create a mostly unique value that represents other data in your table. In most cases, the CHECKSUM value will be much smaller than the actual value. For the most part, checksum values are unique, but this is not guaranteed. It is possible that two slightly different values may produce the same identical CHECKSUM value.
Here's how this works using our music database example. Say we have a song with the title "My Best Friend is a Mule from Missouri". As you can see, this is a rather long value, and adding an index to the song title column would make for a very wide index. But in this same table, we can add a CHECKSUM column that takes the title of the song and creates a checksum based on it. In this case, the checksum would be 1866876339. The CHECKSUM function always works the same, so if you perform the CHECKSUM function on the same value many different times, you would always get the same result.
So how does the CHECKSUM help us? The advantage of the CHECKSUM function is that instead of creating a wide index by using the song title column, we create an index on the CHECKSUM column instead. "That's fine and dandy, but I thought you wanted to search by the song's title? How can anybody ever hope to remember a checksum value in order to perform a search?"
Here's how. Take a moment to review this code:
SELECT title, artist, composer
FROM songs
WHERE title = 'My Best Friend is a Mule from Missouri'
AND checksum_title = CHECKSUM('My Best Friend is a Mule from Missouri')
In this example, it appears that we are asking the same question twice, and in a sense, we are. The reason we have to do this is because there may be checksum values that are identical, even though the names of the songs are different. Remember, unique checksum values are not guaranteed.
Here's how the query works. When the Query Optimizer examines the WHERE clause, it determines that there is an index on the checksum_title column. And because the checksum_title column is highly selective (minimal duplicate values) the Query Optimizer decides to use the index. In addition, the Query Optimizer is able to perform the CHECKSUM function, converting the song's title into a checksum value and using it to locate the matching records in the index. Because an index is used, SQL Server can very quickly locate the rows that match the second part of the WHERE clause. Once the rows have been narrowed down by the index, then all that has to be done is to compare these matching rows to the first part of the WHERE clause, which will take very little time.
This may seem a lot of work to shorten the width of an index, but in many cases, this extra work will pay off in better performance in the long run. Because of the nature of this tip, I suggest you experiment using this method, and the more conventional method of creating an index on the title column itself. Since there are so many variables to consider, it is tough to know which method is better in your particular situation unless you give them both a try. [2000] Added 3-6-2001