Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

[b]Loading Dimension Table Question[/b]

Status
Not open for further replies.

Nassy

Programmer
Mar 27, 2003
121
GB
Hi everyone,

The Background

I have designed a database schema with a fact table and several dimension tables with the view to create a set of reports and cubes from this schema. I am trying to work out a strategy to populate the dimension tables using DTS packages but am pretty new to datawarehousing concepts in general and do not feel confident that what I am doing is along the right lines

What I am currently doing

I am storing dimension values in a text file. The text file will include the dimension values as well as a key. This text file is then loaded into the dimension table.
If members are added/removed then the text file is modified and the dimension fully reloaded using standard bulk copy task. The key I use for my dimensions is a surrogate key.

Question

Is there anything wrong with this approach (the values in my dimensions will very rarely change) or is there any reason that I should use a proper incremental approach? ie detect a change in the text file and then make this change to the dimension table rather than reload the results every time?

The problem is that I have been reading up on literature on slow changing dimensions and am consequently managing to get myself very confused!

Any advice would be much appreciated.

[ponder]

Nassy
 
Q) Is there anything wrong with this approach?
a) Possibly. If you remove dimension members and the fact records remain then the fact records are orphaned.

I don't know your business scenario or requirements but take this as an example. You have WidgetX that you sell. This year you replace WidgetX with WidgetY would it be usefull to track the sales difference between WidgetX and WidgetY? Now apply this across all your other dimensions, What if a customer changes address or last name. By removing the dimension records you render these facts useless in reporting.


This is where Slowly Changing Dimensions come in. There are 3 types of SCDs each has it's uses and obstecles, but all require Definitions from the Business Line users.

Type 1: Overwrite History. A type 1 is just an update to an existing Dimension record, so the historical data is folded into the New Demension Member Definition. Example would be a product Dimension where the Package Color used to be red but changes to blue. If you have package color as a dimension element then you could just update the blue to red and all Sale that product still rollup to the same dimension member.

Type 2: Track History. Type 2 changes are handled with the insertion of a new Dimension record and the deactivation of the old record. This will require the new dimension member to be built into the dimension and all new records rollup to it while all Historical Records will still rollup to the old member. An example to demonstrate this would be Product Price. If you have your product price as an element of your dimension record and the Price changes from $10.00 to $11.00 this may be of significant intrest as you would want to compare how the price change affected sales. To have Type 2 changes work you really need to use surrogate keys since the productid or whatever the native key is doesn't change, by using surrogate keys you can have identical Native Keys but distinct Surrogate keys. I use an active flag so that when I assign my surrogate keys to the fact records I to the most current dimension member.

Type 3: Dimensions are a combination of the 2 where some elements are Updated while others aren't implementing a type 3 is a challange. I have worked on a number of projects some in the 500GB-1TB range and have yet to use a Type 3.

So a more complete answer is that if you method suits you business needs then go with it however there are better options.

Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Thanks for this.

I guess if I am not too worried about orphaned fact table records and the dimension values will never change then I can just use a TRUNCATE TABLE statement to empty dimension values and then reload the dimension when I need to.

In a couple of my dimensions though I will probably want to track history so will look into Slowly Changing Dimensions.

Cheers

Nassy
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top