I can tnink of one reason. What if your data warehouse is multi-sourced. Let's say you have this new company called Master Burger and you have just bought Wendy's and Hardee's, but each of those companies has their own OLTP, with their own codes, variable names, column sizes, terminology, etc.
In this case, the two companies would be represented using their terminology in the staging area, but unified in the dimensional model. Besides, who knows, maybe Master Burger will buy Burger King next, and then there would be the same set of problems all over again.
If I can think of any more reasons, I'll post again.
-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
Thanks! So, ER design lends itself better to storing multi-sourced data? at least to initially preserve the initial differences anyways. Just so I fully understand the example, how does a 3nf data model, as opposed to a dimensional one, work better with different codes, field lengths etc.. is it b/c the 3nf structures would not be conformed?
Yes, conformity would be reserved for the transforms used to bring the data into dimensional form.
For instance, if Burger Chain 1 had a Gender table and used M/F/U and Burger Chain 2 had a Gender table of Male/Female/Unknown, would you want a combined Gender table of all 6 values? No, you want two separate Gender tables which would be then conformed (or transformed) to the Gender dimension of the corporation and its codes, which would be something obviously user-friendly like "Gender Chromosomes" with values of XX, XY, and ZZ.
It gets even funkier with conforming products. Like Cherry Coke. Is it a Cola? Is it a Fruit Soda (like Orange)? Is it a Specialty Product? Each Burger Chain might report it differently whereas the corporation might call it something even more different, like "Other Drinks"
-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
It depends. I once built an ODS for Procurement which acted more like a switching hub for transactions between several purchased and custom systems, one for vendor and contract maintenance, one for requisition, purchasing, receipts, inventory, etc. This was strictly an IT function, making sure each system maintained its (and only its) data elements while keeping a reference copy of other data elements (whose Data Steward was another system).
Other times, ODS's can be used to source departmental data marts and other end-user structures. These might be business side.
-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
What is the main rationale behind having master copies of dimensions in the data staging area, that push changes to the downstream marts? For example, the "Customer staged dimension" feeds 2 data marts each having their own copy of Customer dimension.
Would it not be simpler to not stage your dimensions at all?
Yet supposedly staging dimensions leads to better master data mgt?
Still looking at ways describing why it is that ER design is the approach to take when designing the data models for a data staging area..
Going back to the earlier comment about having multiple tables for the same entity, in the staging area, such as Customer. How in this approach do surrogate keys get assigned? What I usually do is assign an SK to any table which would-be a dimension if it were in a mart. So, reference as opposed to transaction data. With a couple staging tables for Customers (reference data) which one gets the SK?
You would assign the SK to the "unified" customer table. That is, it would be assigned as the customer entered the Star Schema. You could also arbitrarily assign SK's in the Staging Area, but each "real" customer might have multiple SK's from the multiple sources in the staging area. When the matching has been done and the different versions/views of the customer "unified", then the truly unique SK can be assigned.
-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
Do you intend to create a copy of the dimension table for each data mart and the original remains in the staging area? If yes, what what is the reason behind it?
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.