Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Indexing question 2

Status
Not open for further replies.

sparkbyte

Technical User
Sep 20, 2002
879
0
0
US
Not sure if this should be here or in Setup and Admin.

I was looking at the Indexes on the TrackingTable and noticed that when i created the indexes I had
BoxNumber(Non-Unique, Non-Clustered)
FileNumber(Non-Unique, Non-Clustered)
TrackingDate(Non-Unique, Non-Clustered)
PK_tblTrackingTable(Clustered)

Whould it have been better this way?

BoxNumber(Non-Unique, Non-Clustered)
|
--> Included Columns
|
--> FileNumber(Non-Unique, Non-Clustered)
|
--> TrackingDate(Non-Unique, Non-Clustered)

PK_tblTrackingTable(Clustered)


Thanks

John Fuhrman
 
I agree with markros. But it's not just the searches either. It also depends on the joins to this table. Be a little careful about adding and/or changing indexes because you could end up making parts of the system slower.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
What would be the diffence between the two?

Thanks

John Fuhrman
 
In some ways, you can think of an index as another table. Of course, it's not, but it some times helps to think of it that way.

Each row in the index contains a pointer to the row location in the actual table, and the data for the columns that make up the index. This way, SQL can use the index to quickly find the rows in the actual table.

When you write a query that ONLY uses data from an index, SQL server can recognize this and NOT use the data in the table because the data is already in the index.

For example, let's think about the TrackingDate. If you wanted to know how many rows in your table have a TrackingDate for today, you could write a query like this:

[tt]
Select Count(*)
From YourTable
Where TrackingDate >= '20100707'
And TrackingDate < '20100708'
[/tt]

If there is an index on the TrackingDate column, SQL can use the data in the index to determine the count without actually going to the table to get the data. When this happens, it's called a "covering index".

Before I continue, I need to talk about multi-column indexes. Single column indexes are rarely useful. I mean... they can help you by speeding up certain queries, but multi-column indexes can do the same thing, but also help with more queries.

There are 2 ways to make a multi-column index, and it's important that you know what both of them are, because they behave differently.

By definition, a multi-column index contains the data for more than 1 column. The order in which the columns appear in the list is critically important. Imagine you had a people table that stored PersonId, ShoeSize and EyeColor. Now imagine you had index on ShoeSize,EyeColor. The data for this index might look like this:

[tt]
ShoeSize EyeColor
9 Blue
9 Green
9.5 Blue
10 Brown
10 Green
10.5 Brown
[/tt]
Notice that the data is sorted first by ShoeSize and then by EyeColor. Looking at the list, tell me the number of people that have a ShoeSize = 10. Easy, right. You quickly find the first 10 and then continue counting rows until you get something <> 10. Now, with the same index, tell me how many people have Brown eyes. Since the EyeColor column is not the first column in this index, you would need to start at the first row and look at every row in order to count the Brown's.

Now, imagine there was another index on the table with EyeColor as the first column in the index. This time, SQL can use that index to quickly get your count.

INCLUDED COLUMNS.
Like I said earlier, there are multiple ways of creating multi-column indexes. Multi-column indexes existed in all versions of SQL Server, but new to SQL2005, you can have included columns. In the example I showed earlier, ShoeSize was the first column and EyeColor was the second column. So, in the index, the data is sorted first by ShoeSize and then ALSO sorted by EyeColor. With SQL2005, you could have the EyeColor column be an INCLUDE column. When you do this, the ShoeSize column would still be sorted, but EyeColor would not be. Why is this important? Well... think about inserts and updates on the table. If EyeColor is sorted, it will take (slightly) longer for SQL to determine where (in the index) the data belongs. If EyeColor is an included column, SQL only need to look at the ShoeSize column to determine where the row belongs. This may not seem like much, but with a very large table, it could make a considerable difference.

What would be the difference between the two?

It looks like you have 4 single column indexes right now. No doubt that these indexes are helping to speed up some queries. In the new scenario, it looks like you only have 2 indexes. Basically, queries that were originally using the index on FileNumber and TrackingDate would probably slow down (probably by a lot). By including FileNumber and TrackingDate in the index for BoxNumber, some queries may speed up a little because SQL may be able to use the index data entirely without having to go to the table to get the data.

Personally, I wouldn't remove the other two indexes unless I was sure they were not being used for anything. It wouldn't bother me to add additional columns to them. Doing this will slow down the index slightly, but may speed up certain queries by a lot.

I strongly encourage you test this stuff on a development database. This development database should be an exact copy of the production database (backup and restore to another SQL instance). Then you can modify the indexes and run all the queries that use this table to determine the indexes impact on performance.

I know this was a bit long-winded, but indexes are a LARGE topic, and a very important one at that. If you have any questions, now's the time to ask.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks VERY much!! That was a excellent explanation. I will have to play with the Dev copy and see what happens when I make changes.

One other thing though. When looking at the indexes, I am seeing that they are about 50-60 percent fragmented. Is there an esy way to keep them optimized?

Thanks

John Fuhrman
 
I think I found my answer on defraging the indexes.

Maint. task.

Thanks

John Fuhrman
 
Let me explain about index fragmentation, and then provide a couple links for finding and/or fixing fragmentation.

Index data is stored on the hard drive. When SQL Server reads and writes to the drive, it always does it in 8k chunks. The data in an index is sorted, so position matters. When you create an index, the default behavior is to completely fill the 8k page with data (no extra space). Now, imagine you insert a row in to the table. This will cause an insert in to the index also. If there is no room in the index for the data, you will get a "page split". 1/2 of the data is moved to another 8k page and removed from the original. This leaves room for the newly inserted data. Each page split is relatively fast, but can lead to a performance problem.

Suppose your index data requires a megabyte of storage. That's roughly 125 data pages. Now, when you run a query that uses that particular index, SQL will go to the hard drive 125 times to get the index data before it can use the index. Now suppose you have 50% fragmentation. This means that a lot of the data pages are empty, which is wasteful of space (not really a big deal), but also wasteful because it causes SQL Server to go to the disk 250 times instead of just 125 times. This will cause things to be slower.

You can reindex your data, but you need to be really careful about doing this. With then Enterprise version of SQL Server (2005 or higher), you can do an online reindex. An online reindex means that the index data is still available even during the reindex process. Without the enterprise version of SQL, reindexing causes a table lock on the entire table, which means no queries will be able to use the table for the duration of the reindex. If your table(s) are HUGE, reindexing could take minutes (or more).

My suggestion is to use the DEV database to create a reindexing script that works for you, and then schedule a job to reindex the indexes at certain times during the day. You said in another post that you have about 10,000 new rows per day. This is pretty low volume stuff, so you should be able to get away with reindexing once a night when your DB usage is at it's lowest.

I suggest you read this:
At the bottom, there is a link to a script that can help you rebuild/reindex your indexes.



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks again!

You are right about the volume being fairly low (10K rows per day) on the DB.

This was the 1st of around 60 Access DB that will be upsized into SQL Server 2005 Enterprise.

During this process I have also been tasked with creating a new application that combines the features of all the Access Applications being upsized to create a single interface that can be used by all the various departments and track the movement of the paper files from when they come into our center to when they leave again.

This process is circular in nature.

We are essentially a big file cabinet that people check out folders from; use them for what ever purpose they need, and then the folder is to be returned.

Just as a reference, our center is about as big as the average shopping mall. There are a few areas where people process the incoming requests, examine the file for sensitive material, and either scan the file or mail it out depending on need. The files reside in “stack rooms”. These are 2 story underground storage rooms around 10 to 12 thousand square feet each. There are currently four of these full now.

So the task of getting a handle on all the processes has been a bit daunting. So I cannot thank you and everyone else on Tek-Tips enough for sharing their expertise.


Thanks

John Fuhrman
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top