Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SCD 2 and fact tables.

Status
Not open for further replies.

prgmrcr

Programmer
May 11, 2009
56
0
0
US
Hello Members,
I am in the process of learning datawarehouse concepts. I have read through few books. However, i have one question about SCD Type 2 dimensions. For example, if there is an person X and he is located in CO and he has some data in fact table. These two tables are joined by personid key. Now person x has moved to CA. So a new record has been inserted into the person dimension which will be having a new id. How the fact data of the person x is now linked to person dimension for person x for the newly inserted record ?


Thanks in Advance!
 
The primary key of your Person dimension will be a surrogate key. This will be something with no meaning, like an incrementing integer.

So let's say PersonID #5 has moved from CO to CA. His row for CO may have PersonKey #100. His row for CA may have PersonKey #800. But they both have PersonID #5. Your fact table will have PersonKey as the foreign key, not PersonID.

Does this make sense?
 
To build on RiverGuy's example

When the fact record was inserted, the surrogate key pointed to the first PersonId #5 record (i.e with surrogate key #100).

When Person#5 moved to "CA" a new record was inserted into the Person dimension table with a surrogate key of #800.

The fact record however continues to point to record #100 in the Person dimension table because that record represents the way that person looked like when the fact record was created. This is how the fact record continues to maintain an accurate representation of history. In your example, the person was a salesman for the CO area at the time the sale was made. Therefore his sales get credited for the CO area.

Type 2 Dimension tables can be a pain:

For example, you may not always want your fact record to hang on to all the history of the dimension record. For example, you are likely to want to have the most current mailing address of the person which would be represented by the last generation of the record.

Then there is the problem of corrections being made to fields that trigger the creation of a new record. I have seen fields toggle back and forth.

 
RiverGuy and dkyrtata,

Thanks so much for valuable inputs. The examples you guys have given make perfect sense. How ever the issue mentioned by dkyrtata thrown me off again. So with SCD type2, if the person has a different address after moving to CA, if you want your facts tied to the most recent address, how to manage the facts then ? Do you insert a new record in fact table as well ? If yes, how would the load the data in facts table ? What should be the approach ?

Thanks in Advance!
 
Good Question. I would like to hear what others do to handle this. Because of the problems associated with Type 2 dimensions, I avoid using them - preferring to stick with Type 1 when possible

No, do not insert a new fact record. That will just mess up your aggregations when you inadvertently double-count your sales from the redundant records. Plus you would create a maintenance nightmare when you have to traverse all your fact tables to make adjustments caused by changes (if any) to your dimension records. Keep in mind, your dimension tables will be referenced by any number of fact tables. And your fact tables will increase in number and size as your data warehouse/mart matures.

Here is something you can consider:

Add a surrogate key field, CurrPersonKey, to your dimension table that would point to the last generation of each record. This means that every time you insert a new record, the CurrPersonKey field would be updated for all previous generations of the record. So in the example, set CurrPersonKey=800

Here is the SQL code that would update all generations except the last (Last generation would have an expiry-date of 31-Dec-9999)

Code:
UPDATE Person prev
   SET prev.currPersonKey=(SELECT personKey FROM person curr
                            WHERE curr.personID = prev.personID
                              AND curr.row_expiry_date = TO_DATE('31Dec9999','DDMonYYYY') -- Current generation
                          )
 WHERE prev.PersonID='#5'
   AND prev.row_expiry_date != TO_DATE('31Dec9999','DDMonYYYY') -- Expired (previous) generation
 
If yes, how would the load the data in facts table ? What should be the approach ?

You handle it in your ETL.

Here is how your dimension looks (note that I am not displaying the state columns):
Code:
PersonKey     PersonID     ValidFrom     ValidTo
---------     --------     ---------     -------
100           5            1900-01-01    2011-03-31
800           5            2011-04-01    NULL


When your ETL runs to populate your dimension, if you detect a Type-2 change, you pick your dimension row which has a NULL ValidTo. You then "close out" that row by putting in the date that the row is good until. You then insert a new row with today's date as ValidFrom, and a NULL for ValidTo.

When your ETL runs to populate your fact table, you lookup or join on the business key, which is PersonID in the case. But you also lookup on the date range between ValidFrom and ValidTo. So if your fact row is from 2010, your going to pick up PersonKey # 100 because 2010 is between 1900 and 2011-03-31. If your fact row is from 2011-04-25, you're going to pick up PersonKey #800. Get it?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top