Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Best reason for keeping Staging in ER? 1

Status
Not open for further replies.

sweetleaf

Programmer
Jan 16, 2001
439
CA
I was wondering what the general consensus is on why not dimensionalize the data staging area?
 
I can tnink of one reason. What if your data warehouse is multi-sourced. Let's say you have this new company called Master Burger and you have just bought Wendy's and Hardee's, but each of those companies has their own OLTP, with their own codes, variable names, column sizes, terminology, etc.

In this case, the two companies would be represented using their terminology in the staging area, but unified in the dimensional model. Besides, who knows, maybe Master Burger will buy Burger King next, and then there would be the same set of problems all over again.

If I can think of any more reasons, I'll post again.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Thanks! So, ER design lends itself better to storing multi-sourced data? at least to initially preserve the initial differences anyways. Just so I fully understand the example, how does a 3nf data model, as opposed to a dimensional one, work better with different codes, field lengths etc.. is it b/c the 3nf structures would not be conformed?

Thx again.
 
Yes, conformity would be reserved for the transforms used to bring the data into dimensional form.

For instance, if Burger Chain 1 had a Gender table and used M/F/U and Burger Chain 2 had a Gender table of Male/Female/Unknown, would you want a combined Gender table of all 6 values? No, you want two separate Gender tables which would be then conformed (or transformed) to the Gender dimension of the corporation and its codes, which would be something obviously user-friendly like "Gender Chromosomes" with values of XX, XY, and ZZ.

It gets even funkier with conforming products. Like Cherry Coke. Is it a Cola? Is it a Fruit Soda (like Orange)? Is it a Specialty Product? Each Burger Chain might report it differently whereas the corporation might call it something even more different, like "Other Drinks"

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Great examples. This makes sense to me now. Thanks!
 
..Along similar lines, in your xp are ods' usually kept on the business side or within the DW?
 
It depends. I once built an ODS for Procurement which acted more like a switching hub for transactions between several purchased and custom systems, one for vendor and contract maintenance, one for requisition, purchasing, receipts, inventory, etc. This was strictly an IT function, making sure each system maintained its (and only its) data elements while keeping a reference copy of other data elements (whose Data Steward was another system).

Other times, ODS's can be used to source departmental data marts and other end-user structures. These might be business side.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
What is the main rationale behind having master copies of dimensions in the data staging area, that push changes to the downstream marts? For example, the "Customer staged dimension" feeds 2 data marts each having their own copy of Customer dimension.

Would it not be simpler to not stage your dimensions at all?

Yet supposedly staging dimensions leads to better master data mgt?

 
Still looking at ways describing why it is that ER design is the approach to take when designing the data models for a data staging area..

Going back to the earlier comment about having multiple tables for the same entity, in the staging area, such as Customer. How in this approach do surrogate keys get assigned? What I usually do is assign an SK to any table which would-be a dimension if it were in a mart. So, reference as opposed to transaction data. With a couple staging tables for Customers (reference data) which one gets the SK?

Thanks.

 
You would assign the SK to the "unified" customer table. That is, it would be assigned as the customer entered the Star Schema. You could also arbitrarily assign SK's in the Staging Area, but each "real" customer might have multiple SK's from the multiple sources in the staging area. When the matching has been done and the different versions/views of the customer "unified", then the truly unique SK can be assigned.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
sweetleaf, just curious.

Do you intend to create a copy of the dimension table for each data mart and the original remains in the staging area? If yes, what what is the reason behind it?

Anand
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top