Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Daily Snapshots

Status
Not open for further replies.

joy0603

IS-IT--Management
Jul 13, 2004
5
US
I have a new requirement to track history (not just dimension attributes through slowly changing dimensions) but also, fact attributes. Rather than inserting a new record into the fact table as data changes, I was thinking about creating a daily snapshots. This results in huge increase in the disk storage.

Another option: rather than creating a daily snapshot, I was thinking about inserting a new row into the fact table when there is a change. In order to figure out which record is the latest changed record, I would have to do Max() on each record to figure this out.

Any other options?

Also, is there any standards or benchmark that determines whether it's better to use snapshots versus fact tables (with histories)? Storage versus CPU (performance)?



 
Wow. Best I can say is that I don't have enough info to give you any solid recommendations. But here's a few things to think about.

How many facts change each day. If the percentage is large, then the snapshot would seem to make sense. If low, then the extra fact table records seem reasonable.

How often do you "purge" or "archive" the fact tables? Why are the facts changing in the transactional system? Why are the users interested in tracking the changes? Are some attributes changing much more rapidly than others? If so, perhaps a "twin" fact table with a 1 to 1 relation to the slower changing or not changing facts.

Snapshot is certainly a well defined and often used DW structure. Most often for end of day, month, year recording of volatile fact tables like General Ledger or Account Balances.

Hope this helps, more help can be had with more details of your situation.

-------------------------
Sometimes the grass is greener on the other side because there is more manure there !
 
Thanks for the quick reply. We purge the fact table data after 2 years.

Why are the facts changing in the transactional system? We deal with opportunties and the revenues on the opportunities can change. Users want to track the revenues (what did it change to, who changed it, and when was it changed). We deal with multiple revenues.

Business does not want to incur the cost of the additional storage (DASD)going with the daily snapshot option and wants to know if we can track the changes in the fact table(s). I need to basically figure out what do we gain by going with the snapshots versus tracking changes in the fact table(s) and vice versa.

Will the performance of the queries, reports, etc. lead us to go the daily snapshots (trying to determine the last changed record for each record in the fact table)?
 
We also been thinking about going with the daily snapshots for 6 months and then going with monthly snapshots after that.

Does this make sense?
 
Based on your additional input, I would consider a separate table to hold the changing revenue fields, along with all the other attributes associated with the change/opportunity event. I would make this a child/detail table to the other nonchanging facts.

By treating the Opportunity table separately from the rest of the facts, you can gather all the attributes associated with that opportunity; justification for change, increase chance of success, cheaper raw materials, bulk purchase contract, etc.

Just some more things to think about.....

-------------------------
Sometimes the grass is greener on the other side because there is more manure there !
 
That's a good idea... I will think about it.

Another question. What do you call this? It's not slowly or rapidly changing dimensions since it's fact table. Is it called slowly or rapidly changing facts?

 
I read the Kimball's Data Warehouse Lifecycle Toolkit second edition and I think Accumulating Snapshot Real-time Partition will work for us. Our average life span for an opportunity is 90 days. Real-time parition will contain history of opportunities for 90 days and the accumulating snapshot will be inserted/updated nightly.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top