Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Storing complex order pipeline data in a DW, is Kimball method applic?

Status
Not open for further replies.

rvstrien

Programmer
Jul 31, 2008
1
0
0
NL
Hello,

On page 128 of Kimball's book: The Data Warehouse Toolkit (2002), he describes the Accumulating Snapshot for tracking the movement of orders through a pipeline.

For those who don't have the book, he basiscally recomends to put one row per order in a fact table and to have many status attributes in the fact table that is referenced to a date dimension every time an order reaches a certain milestone an order chain. Therefore, the fact is updated everytime an order has reached a next milestone in the chain.

The example that Kimball presents is pretty simple, and contains six milestones in the chain. However, my question is what to do in real life situations where you're chain would be more complicated, e.g. where you have around 50 milestones and time becomes important too. Then, you would easily have over 100 attributes that link to the date and time dimensions. Would this be a good design choise, since the fact table gets soo many attributes. Is the Kimball methodology still applicable here?

Many thanks in advance,
Remco van Strien.

 
With such a large and indeterminate chain of events, I might look at "state change" fact table, which is basically a transaction fact table, where each fact is the event of an entity (in your case, an order) changing state (in your case, going from one point in the pipeline to another). Put a pair of datetime fields in there to track the beginning and end of each state; I'd make them redundant, putting real datetime fields directly in the fact table, for ease of querying and understandability, in addition to foriegn keys to your calendar dimension.

What's often interesting in state changes are the pairs of changes (from this state > to that state). How many times, for instance, does an order go into a "backorder" state?There are a few ways to model this, but one I'd suggest making a dimension table of all the pairs-of-states, and then putting a fk to this in the fact.

You'd need to define what you mean by "state", of course, and if you want "state" to mean "all the attributes of the order", rather than just "the part of the pipeline", then you'll have lots of state change transaction records (and you may need more than one dimension to track state change pairs, if you can organize them, for instance, into different spheres of interest).

The approach is somewhat similar to Ralphie's HR model -- see
Cheers,
Jeff Prenevost
 
I agree that a separate Order Event table may be the best solution. So, reading the most recent event date for an order will give its current status and attributes. Set up a concatonated index on the order and date columns and another on the date and order columsn. So, when the interest in tracing the history of one or more orders, the first index (order/date) can be used. When interested in what happened on a particular date, the second index (date/order) would be used.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
This is dependend on the queries you have to run. If you want to compare the time between 2 states, a query on a transaction fact table will be ugly and ask a lot of resources (but it can be done).

Adding time to an accumulating snapshot poses no problem whatsoever. Time is inherent to datetime datatype.
And 100+ date fields (of which many may be null) is no problem either (except for a clear understanding perhaps).
The ETL proces for 100+ fields looks daunting, but is in essence the same trick over and over again. So it is not really complex.

What I have done before is having 2 datetime fields for every state as a product may move back and forth between states. So I had a date with the first time a certain state was reached and another one for the last time the state was reached.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top