Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

how many rows is too many?

Status
Not open for further replies.

joelwenzel

Programmer
Jun 28, 2002
448
I am creating a database table and depending on how I design my database, this table could be very long, say 500 million rows. It is for an internet application that will be queried a lot (indexed on date and userID). Will it be slow for users to access data from this table? Should I consider design the table so that it is not in first normal form (I can shorten the length to 25 million rows by organizing the data in columns)

I am having a hard time determining the answer to this question. I've searched the internet and there is no place that says something like half a billion rows are too much.
 
Half a billion rows is a lot, if indexes properly which a well thought out disk layout your data couldbe returned within a few seconds.

However, that said, if you can shrink a table from 500 million to 25 million by using proper normilizaion they by all means normilize the data.

I've managed and queried tables will well over a billion records in them and it was no issue. The tables weren't well indexed so queries would take a while, but it wasn't an issue.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
mrdenny,

I think Joel was suggesting that he can [!]de[/!]normalize the table so that there were 5 million records instead of 25 million.

I'm curious to know your thought on that.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
yes, that is right. I was saying I could denormalize them to reduce the table size. Basically, each row in the noralized table is described by 1 of 20 categories. Each data set will usually contain 20 datapoints corresponding to each category. I can just create a new denormalized table that contains 20 columns, one for each category (rather then just one column that stats the category of each point). However, sometimes a datase will only contain 10 datapoints.
 
Take a simple math:

Flat table: userID - 4 bytes, 20 category values - 20 bytes (tinyint IIRC). Total: 24 bytes per row.
Normalized table: userID + categoryID + categoryValue - 6 bytes.

24 = 6*4, which means 1NF table will be smaller or equal in size only if exact average number of filled categories is four (4) or less.

Btw. things in reality are rarely so black & white... and storage size is just one of many criteria. How about few words about what is that DB supposed to do? Within business/contractual/NDA limits of course.

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
hmmm this is interesting. I should be doing stuff like this more often...I'm not thinking about the actual issues enough but I guess that will come with experience. My issue was actually the query time length...but from what you are saying, I might as well be using denormalized tables (in this case) anyway since the normalized ones are going to be longer in length (number of rows) and in size. In my case, the query advantages of using a normalized table for this case are minimal as well (at least that's the way it appears so far)
 
But if you denormalize it across 5+ tables, you then either have to union the results from each, or make 5+ separate queries and merge the results in application code.

I think as long as your keys are organized well, and you're querying by key, you won't have any problems with a single table.

Probably time for more math. UserID (4 bytes) + CategoryID (4 bytes) plus category value (20 bytes??) == 28 bytes. Each page in SQL Server 2000 is 8192 bytes, with 96 bytes used by the page header, leaving 8096 bytes for user data. With a 32-byte row size (28 + 4 bytes for row offset), this means you can store 253 rows in a page.

500 million rows then take 2371542 pages (80% fill ratio), or 18.6gb of disk space, easily managed by modern hard drives. You'll probably want RAID 1 or 5 to reduce the chances of dataloss when a drive fails.

Do you know what access pattern the queries will use? Will there be a lot of sequential reads, or will the queries be fetching individual rows?

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
You will also need to look at how you will be indexing the data. If you denormalize the table you'll be creating larger indexes than a single index on an integer field. You'll also need additional padding within all the various indexes.

If this was a system that I was designing I would go with the 3NF table layout.

With a properly indexed tables, and proper drive config even going through 500 million records shouldn't take very long.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top