Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Is Archiving a table necessary?

Status
Not open for further replies.

wau25

Programmer
Mar 12, 2004
21
US
I am creating a table that is going to have a lot of data in the future. If I have a flag column "Archived" in the table to differentiate between active data and archived, would that be enough to prevent any query performance issue or do I need to actually create a separate table and move old data to that table?
 
Just creating a bit field that marks the row as "inactive" is not archiving. It will not in any way improve query performance. In order to get a performance gain, you have to physically move the rows to a different table.
 
a lot of data in the future

What do you consider "a lot of data"?

My concern here is that you may be "prematurely optimizing" your code when it may not be necessary. jbenson001 is right. Adding a bit column to your table will do nothing to improve performance. But.... moving data out of the table may not do too much to improve performance either.

The real answer is, "it depends". If this is a "sloppy" table to begin with, then performance is going to be a problem. If the table is properly normalized, making sure you keep the table as narrow as possible, and you properly index this table, then you may not have performance problems even with millions of rows in the table.

1. Make sure your table is properly normalized. There are MANY articles out there that talk about [google]database normalization[/google]. If you don't know what this is, make sure you read a couple of these articles before moving forward with this task.

2. Keep your table as narrow as possible. By "narrow", I mean, keep the column definitions as small as possible. A narrow table is one that has a relatively small number of columns, and those columns are as small as possible. Try to eliminate or reduce your strings, remove unnecessary (or seldom used) columns to another table. You will get best performance with int columns, where those ints point to other look up tables. You see... SQL Server stores data on the drive in 8K pages. By packing as many rows as possible in to 8K pages, you will reduce your file i/o and your performance will increase.

3. Indexes. Make sure you do not have any duplicate indexes. Also, make sure that the queries executed most often are covered by an index. Indexes are the real key to optimal performance. Properly indexed, a table with billions of rows can have queries that execute in milliseconds.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
The table is properly indexed, normalized, has few columns and will probably have millions of rows, 99% of which will be inactive.

I'd really prefer not having to create a separate table if possible. Given my scenario, is moving data to a separate table still my only option?
 
I don't think we are saying that it is your only option. If you say the table is indexed, normailized and narrow, then you will have to run some statistics to see how your queries are running. If at some point the queries don't run in an acceptable time, then you will have to move rows to a different table.
 
I agree. Done properly, there is no problem with a table that has millions of rows, even if you are only (usually) dealing with 1% of the rows.

Personally, I would probably manufacture some dummy data to load in to this table so that it matches your expected growth for the next 5 years or so, and then run your queries to see what the performance is like. This is good advice for everyone. Testing your database on a small handful of rows is NOT acceptable. Load up your tables and then see what the performance is like. If you need to partition the table(s) to get acceptable performance, at least you'll know before the production database gets too large (and slow).

If you find yourself in a situation where you think you need to partition the tables (because queries are slow), I would encourage you to post the queries here (in a new thread). We will take a look and probably offer suggestions on how to improve the performance. Often times, adding indexes and writing sargable queries is all that is required to improve performance.

If you do find yourself in a situation where you need to split the data in to separate tables, I encourage you to do some research on partitioning.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top