Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SK's in staging and marts 1

Status
Not open for further replies.

sweetleaf

Programmer
Jan 16, 2001
439
CA
Hi,

Let's say I have an order tracking subject area in my data staging area, so have these 3nf tables: orders, order_details, payments, customers.

If I want to assign SK's what tables would I assign the SK's to in this scenario? I know the idea is to assign to "dimension-like" tables but in 3nf terms this is not always straightforward.

Reason being is that on the mart side, some fields from the above tables would go into an ORDER dimension table(context) and some would go into an ORDER_TRANSACTION fact table(measures) and I'd like to ensure that I'm assigning SK's properly in the staging area.

Any help is greatly appreciated!

 
For your best solution, I see two fact tables and two (major) dimensions (order, customer).

You have the order item table and the order payment table as facts. It is possible to combine these two, having a single fact table (order transaction), but I recommend you do this as a summary or aggregate table rather than a base fact table.

A few reasons for separating the two. You can have order items which are not yet paid for. You can have order items which were returned and will never be paid for. You could get two (or more) payments on an order or order item, either by design (partial payment) or because the customer accidentally wrote the check for the wrong amount.

-------------------------
The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
 
Okay,lets say I treat the ORDERS and CUSTOMERS 3NF staging tables as dimensions so i assign sk's to both (ORDER_WH_SK and CUST_WH_SK respectively).

...When these tables plus the 2 others (order_details/payments) translate into dimensional tables in the mart would I then have an ORDERS and CUSTOMER dimension with fact table looking like below?

ITEM_FCT(ORDER_WH_SK, [fk,composite pk]
CUST_WH_SK, [fk,composite pk]
TIME_DIM_KEY [fk,composite pk]
...LINE_ITEM_AMT)

and

PAYMENT_FCT(ORDER_WH_SK, [fk,composite pk]
CUST_WH_SK, [fk,composite pk]
TIME_DIM_KEY, [fk,composite pk]
...PAYMENT_AMT)

thnx!
 
Yes! That's the way I envision it based on the info you've provided. Then, if you want, you can create a summary or event table called Order Events and combine the two, or create a summary or event table called Customer History and combine the two that way instead or as well. Or create a grand table called CustomerOrderHistory. Depends on the reporting and query requirements, which you may or may not already know.

-------------------------
The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
 
Is it still sound design to not have SK's assigned in the 3NF staging area but have SK's instead assigned in the marts?

 
Either way. My current project assigns them in the ODS (staging area in CIF (Inmon et al) designs), but the Kimball approach assigns them directly in the star schema only. I have done both.

-------------------------
The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
 
Thanks johnherman - that's very interesting. And relieving to hear as I've recently walked into a place where the staging area does not use SK's. But now users want dimensional marts built to draw data from this staging area and as a best practice I will want to ensure that SK's are used in the upcoming dimensional designs(join performance, insulation from changes to operational systems etc). I'm used to being in environments were SK's are only generated in staging areas and are 'direct moves' into the marts.

Also, to your experience have 3NF staging areas typically contained 3NF common/conformed dimensions(or reference tables)? For example, would both Staging and a Marts have their own copies of a customer, time, geography dimension that they'd all sync up with data in the Staging copy?

Thanks again for the insights.


 
Regarding your question about dimensions and their source (reference or lookup) tables in the staging area. I suppose anything is possible, but I am accustomed to having a basic reference table in the ODS, perhaps keeping versions (for type-2 SCD's) but keeping them with effective dates as well as SK's.

Remember that a dimension could consist of many reference tables. For instance, the geo dimension could consist of a US Postal Service supplied ZIP code and CITY table, rolling up into a STATE table, which has been enhanced internally to include the Sales Territory and Sales Region (Northeast, South, etc) or other facts about the state. That state table might roll up into a COUNTRY table. Rather than have a snowflake designed dimension, which is a poor performer, this 3-4 level geo hierarchy is flattened into a single geo dimension.

So, the answer to your question is yes, they have their own dimension if needed, or can share the same dimenison.

-------------------------
The reasonable man adapts himself to the world. The unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
 
What are your thoughts on using an actual SK as a PK in a fact table rather than having the fact PK be the composite of the dimension FK's?
 
I *ALWAYS* use SK's whether the natural key to the table is a single column or a concatenation of several columns. I never rely on an external system to provide my uniqueness.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
But now users want dimensional marts built to draw data from this staging area and as a best practice "

As part of a best practice I would not allow users to draw data from the staging area. :)

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top