Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Filling a fact table...how fast is fast?

Status
Not open for further replies.

TJRTech

Programmer
Apr 8, 2002
411
US
First, let me state that for our data mart we are NOT using any type of an ETL product or toolkit and are instead using custom stored procedures. The decision to go this route was one of cost and expediency, but may be "rethunk" in the future.

My question is, how fast is fast when it comes transforming millions of records from an operational system to a Data Mart star schema?

Our home-grown SPs are transforming and loading operational records at a speed of about 2500 recs/minute (just over 40 records per second).

Is that slow, is that fast? This is running on a pretty beefed up SQL 2K box, and the raw operational records are already within the DM server in a flat stage table. The process pretty much looks ups previously inserted dimension f-keys and inserts facts.

The star schema is one of our more complex and has about 10 dimensions, several of which need to be looked up. We have tried using a cursor and a simple while loo

Should we expect to be able to get substantially faster throughput with an ETL package and if so, what manitude?

Thanks,
TR

 
are you updating exisitng records and inserting new ones or just inserting new records. If the goal is to just insert new records then you could use bcp. After all records have been staged and keys updated in the stage area you cold then BCP out from stage the records to be inserted and then immediately BCP these records into your fact. if your doing both updates and inserts you could update the appropriate records and then BCP the new records.

I was on a project that used this technique for a dw that inserted 15-30 million rows per week and the inserts only took a couple hours

"Shoot Me! Shoot Me NOW!!!"
- Daffy Duck
 
For this particular star schema we are just inserting new records into the fact table.

You do give me a good idea, though. Right now, we are doing a "record at a time" from the stage table in our ODS. The ODS and Data Mart are on the same box. In the loop, for each record, we lookup each of the dimension table PKs needed as FKs in the fact table. This is about 5 or so lookups per record.

Now, the idea is that we could alter the stage table, adding in the FK columns, and add those en-mass via UPDATE statements; one update per dimension that requires PKs to be looked up.

Then, we could move to an error table those stage records that have unresolved Dimension FKs, and simply do a INSERT INTO...SELECT FROM STG_.. to move the remaining ones from the stage table to the dimension table (or use BCP).

I think that would speed things up, wouldn't it? Instead of doing several million lookups, you do larger updates against huge data sets.

Oh, and this is really only an issue as we bring over historical data when "first" turning a new system.

Regards,
TR
 
Set operations are preffered over individual record operations.

"Shoot Me! Shoot Me NOW!!!"
- Daffy Duck
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top