Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Modelling date attributes in fact tables? 1

Status
Not open for further replies.

Vivarto

Programmer
Nov 27, 2005
43
AU
There may be a standard approach to this as I'm sure I'm not the first to come across the issue.

The Problem:

We have a date dimension (d_date) with an integer SK, date_sk, starting at 0 and going up to 3650, or thereabouts, for ten years. It also has columns for the date in various datatypes including one for datetime (date_dte). It also has another column for public holiday that holds either 1 or 0

We have another dimension, d_claim (this is an insurance application) with its own numeric SK claim_sk. The d_claim dimension has a column of datetime datatype for loss_date.

Finally we also have a fact table that has a number of dimension SK columns as part of its (composite) PK including date_sk and claim_sk.

I want to group claims into those where the loss_date was a public holiday and those where it is not and then sum the claim dollars from the fact table.

Possible solutions:

The immediate impulse is to join the fact table to the date dimension and date dimension the to the claim dimension then simply group by public_holiday_flag. The problem here is that the join between the two dimension tables would be on two columns of datetime (loss_time and date_dte), completely ignoring the SK on both tables. This just doesn't feel right though it should work.

The second solution would be to add a column for loss_date to the fact table but to populate it with the appropriate date SK as defined in d_date. The fact_table.loss_date column could then be joined directly to the d_date dimension on date_sk and the public_holiday_flag read off.

The issue I have with this approach is that the fact table would now have a series of dimension SK columns (which together define the PK), a generated SK column for what is an attribute of one of these dimensions (loss_date) and then the measures.

My understanding is that snow-flaking is the normalisation of a dimension table to remove redundancy but hanging one or more copies of the date dimension off other dimensions doesn't fit this definition.

Does anyone have a view on 'best practice' here?

TS









 
It is permissible within theory to add the Loss Date to the fact table. This is what is sometimes called a "Factless Fact" or, as I prefer to call them, Event Fact. This happens often in tracking of Claims, Orders, etc. In practice, this is often done also.

The fact table joins to dimension tables via SK and joining to the Date Dimension for Loss Date is the right design. In fact, you might have several SK's to the Date Dimension (Reported Date, Paid Date, Appeal DAte, etc).

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Thanks for the input. As I'm sure you appreciate this is an example of a class of problem. I do indeed have multiple dates that could be pushed on to the fact table and the issue now is whether to automatically put all dates form all dimension on to the fact or just those where there is an immediate reporting requirement.

I suspect this will have to be a judgement call at design time. The other dimension that comes to mind for this sort of treatment is person/party where related person classification information is often required e.g. Fred Nerk logged the call but which department does he work in? Fred is named in the CALL dimension but department is in PERSON. Again, this can be resolved by pushing the person_sk on to the fact.

This is going to add an interesting twist to the design process.

Thanks again, Tom
 
For this case I would suggest a second fact table. One that mr Kimball names: accumulating snapshot. It contains a set of dates (references to the date dimension), like loss_date, reported_date, paid_date, appeal_date, etc. You can use a drill across between the accumulating snapshot (with all the dates) and the claim fact table.
 
Hans63 makes a valid point. However, I suggest that if a significant amount of aggregation is going to occur across the Loss Date dimension, that the dimension belongs in the accumulating table. Other dates which may not have as many aggregations might be kept in a separate ClaimEventFact table.

I suspect Vivarto and I are currently working in the same subject area - P&C insurance (auto, home, etc). Loss Date is a very significant dimension in that subject area since it is directly associated with financials. Other dates are probably not as important to be in that table. For us, reported date is not used to aggregate measures, but instead to generate more measures such as times to appraise and close claims.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top