records in the billions

sweetleaf · Apr 13, 2005

Hi,

I have to load b/w 1-2 billion records from 1 data file into 1 or potentially more fact/dimension tables in my udb data warehouse monthly and was wondering - for capacity planning sake - if there are any best practices and guidelines I can heed before I begin designing my tables?

eg. can db2 tables even store tens of billions of records and perform ok? with this much data can i stick by traditional modeling principles- eg. 3nf or dimensional -- or should other techniques be used? eg. 1 table for each month partition (so, 12 tables for 1 year of data, 36 tables for 3 years etc..)?

Any help would be greatly appreciated!

PruSQLer · Apr 14, 2005

Wow, that is a LOT of records!!! Usually when the tables in our database get up towards a billion rows, we try to do archiving or placement of the rows on alternate tables. Otherwise a fairly serious degradation in performance results. Our database platform is not small but we do have quite a few users so that can be a factor.

Now if you have a very powerful platform with mega processing power, copious amounts of memory and disk space for temp files and efficient indexing, you might be able to pull off what you're talking about. It also helps if traffic in the database isn't that heavy.

But again, that is a big load to handle. At least, we've found it to be so.

gregsimpson · Apr 14, 2005

Sweetleaf,

on the last os/390 project I worked on we had to load and unload 2 billion rows each week. We partitoned into 64 and loaded these partitions in parallel using BMC LOAD, with some fancy options specified by the DBA. Beleive me when I tell you the load jobs take a mtter of minutes to load all 2 billion rows.

Ashamed to say I did for a while break the 3nf rule but in the end went back to it, as it simplified processing and meant the loads perfromed well without loading superfluous columns on each row.

On the query side I guess use db2 explain and db2advis and I believe you'll get acceptable results.

Cheers
Greg

sweetleaf · Apr 14, 2005

Thanks Greg,

forgive this question but, the table was sliced into 64 logical or physical partitions? not sure i understand this part.

these were then loaded in parallel by bmc load probably with the typical dba loading options: no logging, dropped indexes, table defrag, and 'append on'? and the loads completed in minutes rather than hours - cool.

thanks again!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

records in the billions

sweetleaf

Programmer

PruSQLer

Technical User

gregsimpson

Programmer

sweetleaf

Programmer

Similar threads

Part and Inventory Search

Sponsor