Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

loading dimensions and facts 2

Status
Not open for further replies.

sweetleaf

Programmer
Jan 16, 2001
439
CA
Hi,

I've been at companies that each have a different way of loading type2 dimensions and wanted some guidance on what is the better approach - from a best design practice and performance point of view.

Approach1:
TBL=CUST_ACCOUNT_DIM --> PK(ACC_SK + DIM_TIME_KEY)
Cut new SK only when a new customer account is opened. So, if John Doe already has an existing account# 123, but has changed his address, NO new SK will be generated. What will happen is that a new record will be created to capture his new address with the same SK (123) but with a new TIME_DIM_KEY.
*** What I was never totally sure about is how does this composite PK link/migrate to the fact table?

TBL=CUST_ACCOUNT_DIM --> PK(ACC_SK)
Cut new SK for a)every change to existing accounts and
b)every new account. So in the above example John Doe would get not only a new record inserted, to capture his address change, but would also get a new SK for that record. Also, if a brand new record comes in for Jane Doe (who has no exisating accounts), a new SK will get cut for her.

ANY help on this and an elaboration of the interplay b/w dimensions and facts for these examples would be GREATLY appreciated!

Thanks.





 
Actually, the best solution is (yes you guessed it!) a cross between both of them !!

The actual PK is a surrogate (or synthetic) key generated by the system when the dimension is inserted or updated. Performance-wise, this gives you a join on a PK/SK which is an integer (the most efficent join)

The effective (or expiration, or both) date of each record is also kept, but as an attribute, not as part of the key. So the natural key of the table would be the source system's Customer Number along with a date so that the proper version of the Type-2 dimension can be chosen.



-------------------------
The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
 
The way I have implemented SCD Type 1 and Type 2 is on a selective basis. Not every dimension attribute in my situation is important enough to merit a new record in the case of a Patron something like Phone number. I have process where each column is labeled 1 or 2 in a lookup table and then when the data has been staged I can process the changes depending on the values of the lookup table. I know my process sounds long but add less than 30 seconds for 300K records.

Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
MDX'er is, of course, right. There are several scenarios which do not require the generation of a new record.

Excluding weird events like gender transformation surgery, if a person's sex or date of birth is changed in the system, we can assume it was incorrect in the past, the error has been discovered, and we no longer care about the bad data.

Similarly, in MDX'er's case, if the telephone number changes, we really don't care about what the data used to be, we are not going to call the old phone number. In this case, the data was not in error, but we don't care about the old value(s) anymore.

-------------------------
The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top