Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Incremental load 3

Status
Not open for further replies.

sdmy

Programmer
Dec 3, 2004
12
US
Hi,

we have a datamart in place which will be used by few downstream systems. We'll load the datamart once in a month and we are using datastage ETL tool to do that. Every month we will truncate the existing data after taking backup and load the Fresh data. We have decided to re-model the existing datamart so that we will do an incremental load and capture the history data.
Please let me know the approach on how to start with. What are all the analysis required? How to convert existing into incremental load?

Thanks
 
First, you need to determine which changes to capture. You may not be interested in some changes.
Next, you need to determine which changes should be updates and which should be inserts (for instance, with slowly changing dimensions).
Next, you need to design a method to identify changes. One good option is to have a sequential unload of each table before the incremental update. You should also calculate a checksum value for that row (this can be done well in advance of the update process). Both the incoming file and the unloaded table should be sorted on some keying sequence. Records in the incoming file which are not in the unloaded table are INSERTS. Records in the unloaded table which are no in the incoming file have been deleted and rather than physically delete the rows in the table, they should have a flag set indicating that they are delete or deactivated. Finally, incoming records which match the unloaded table need to have their checksum value calculated. If the checksum is the same, no changes have occurred, otherwise, you must parse the incoming record and compare to the table or unloaded table records to determine what has changed (or simply UPDATE by overwriting according to your earlier analysis regarding what changes to track).

I recommend you buy, beg, borrow, or steal a copy of Kimball's "The Data Warehouse ETL Toolkit" which describes these processes and other things to consider in greater detail.

-------------------------
The trouble with doing something right the first time is that noboby appreciates how difficult it was.
- Steven Wright
 
Moving to incremental load strategy will require analysis of dimensional attributes to decide on if/how to track the changes. SOme attribute changes matter and others may not. You need to take a look at the timestamping of the rows that you capture to ensure there is a continuous representation of each unique natural key for the dimension across time. Identifying current records in the dimension is soomething to consider. Are you going to have to supply a current image view of the facts via the dimension or only the image when the fact occurred? There are many products on the market place to help with "net changing", "delta processing", or "change data capture" whatever you are calling the process which will identify dimensional rows which have been added/changed/deleted across time. Deletes can be implemented logically or through a final end dating of the associated row. Syncsort can be used to do net changes without checksums although checksums are a viable approach.
Data latency needs to be well thought out in the incremental approach as is the recovery approach if the data needs to be backed out. If you have never built one, then Kimball's book should at least get you moving but as always it is simplified.

Hope that helps
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top