There may be a standard approach to this as I'm sure I'm not the first to come across the issue.
The Problem:
We have a date dimension (d_date) with an integer SK, date_sk, starting at 0 and going up to 3650, or thereabouts, for ten years. It also has columns for the date in various datatypes including one for datetime (date_dte). It also has another column for public holiday that holds either 1 or 0
We have another dimension, d_claim (this is an insurance application) with its own numeric SK claim_sk. The d_claim dimension has a column of datetime datatype for loss_date.
Finally we also have a fact table that has a number of dimension SK columns as part of its (composite) PK including date_sk and claim_sk.
I want to group claims into those where the loss_date was a public holiday and those where it is not and then sum the claim dollars from the fact table.
Possible solutions:
The immediate impulse is to join the fact table to the date dimension and date dimension the to the claim dimension then simply group by public_holiday_flag. The problem here is that the join between the two dimension tables would be on two columns of datetime (loss_time and date_dte), completely ignoring the SK on both tables. This just doesn't feel right though it should work.
The second solution would be to add a column for loss_date to the fact table but to populate it with the appropriate date SK as defined in d_date. The fact_table.loss_date column could then be joined directly to the d_date dimension on date_sk and the public_holiday_flag read off.
The issue I have with this approach is that the fact table would now have a series of dimension SK columns (which together define the PK), a generated SK column for what is an attribute of one of these dimensions (loss_date) and then the measures.
My understanding is that snow-flaking is the normalisation of a dimension table to remove redundancy but hanging one or more copies of the date dimension off other dimensions doesn't fit this definition.
Does anyone have a view on 'best practice' here?
TS
The Problem:
We have a date dimension (d_date) with an integer SK, date_sk, starting at 0 and going up to 3650, or thereabouts, for ten years. It also has columns for the date in various datatypes including one for datetime (date_dte). It also has another column for public holiday that holds either 1 or 0
We have another dimension, d_claim (this is an insurance application) with its own numeric SK claim_sk. The d_claim dimension has a column of datetime datatype for loss_date.
Finally we also have a fact table that has a number of dimension SK columns as part of its (composite) PK including date_sk and claim_sk.
I want to group claims into those where the loss_date was a public holiday and those where it is not and then sum the claim dollars from the fact table.
Possible solutions:
The immediate impulse is to join the fact table to the date dimension and date dimension the to the claim dimension then simply group by public_holiday_flag. The problem here is that the join between the two dimension tables would be on two columns of datetime (loss_time and date_dte), completely ignoring the SK on both tables. This just doesn't feel right though it should work.
The second solution would be to add a column for loss_date to the fact table but to populate it with the appropriate date SK as defined in d_date. The fact_table.loss_date column could then be joined directly to the d_date dimension on date_sk and the public_holiday_flag read off.
The issue I have with this approach is that the fact table would now have a series of dimension SK columns (which together define the PK), a generated SK column for what is an attribute of one of these dimensions (loss_date) and then the measures.
My understanding is that snow-flaking is the normalisation of a dimension table to remove redundancy but hanging one or more copies of the date dimension off other dimensions doesn't fit this definition.
Does anyone have a view on 'best practice' here?
TS