Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

ETL and type 2 slowly changing dimension

Status
Not open for further replies.

optimax

Programmer
Jun 24, 2008
4
LU
Hello,

I am in the process of setting up a datawarehouse for a distributed measurement system.

The measurement probes periodically report measurement data (facts) which is automatically ETL'ed into the datawarehouse.

One of the dimension tables is a type 2 SCD with the probe configuration. The probe configuration itself happens via a user interface and is stored in a dedicated database (a system independant from the datawarehouse).

I am unsure on the best approach to follow to keep the type 2 SCD configuration table up to date and I am looking for advice on this topic.

The options I am considering are:

1) Changes in the dedicated configuration system trigger a mechanism to create additional entries in the dim table of the datawarehouse. The keys in the config system and in the dim table of the datawarehouse are aligned.

2) Changes in the configuration system are stored in the probes and reported as part of the measurement data. The ETL does a lookup for the particular configuration in the dim table, and adds a new entry "on-the-fly" whenever a match cannot be found.
Keys for the configurations in the dim table are not aligned with the config system, and potentially even depend on the order in which measurement data from the different probes was reported or ETL'ed.

3) The measurement data ETL process queries the config system and updates (appends records to) the config scd dim table prior to loading the measurement data. Keys in the config system and dim table of the datawarehouse can be aligned.

Any comment, advice or suggestion on this topic will be welcome.

Thank you.
 
Wow. Good question. Regarding option 1, this will insure that you capture all configurations, whether they have facts or not. The other options do not have this capability. Is this capability important? Could it become important in the future? If not, then this option should be discarded as it collects data which is not needed which will waste space and processing time.

Regarding the other options, option 2 seems more "Transaction-Like" and could perhaps significantly delay the load window. Load processes usually like "Bulk" operations, and option 3 seems more "Bulk-Like" than option 2.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top