Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Joining Facts (Dimensional Modelling Question)

Status
Not open for further replies.

JD1866DSC

Technical User
Sep 4, 2006
49
GB
Hi, I understand its not good practise to join fact tables using the surrogate key of a comformed dimension.

What is the best practise to relate fact tables, is it storing the natural key on the comformed dimension or should I be storing the natural key on every fact?

or is there another way around this?

Ideally I don't want to relate the dimensions from one schema that has no relation to the fact of another schema
 
As part of your ETL process, can you add a surrogate key to the Fact tables that you can join on?
 
Hi Thanks for the reply, but I don't think you've understood the problem.
I've got surrogate keys on the fact that join to each dimension in a star schema like fashion. But how do I join to a dimension in a different star schema? Fort example I have two facts claim and hire and two dimensions repair which contains repair start date and claim how do I get to the repair date from the claim fact?

I Tried this, but its not correct as the surrogate key in the claim fact may differ from the surrogate key in the repair fact
SELECT
*
FROM
dbo.SNA_ACC_CLAIMTRANSACTION_FACT CF
JOIN
dbo.DIM_CLAIM C ON CF.CLAIM_KEY = C.CLAIM_KEY
JOIN
dbo.SNA_ACC_REPAIR_FACT RF ON RF.CLAIM_KEY = C.CLAIM_KEY
JOIN
dbo.DIM_REPAIR R ON R.REPAIR_KEY = RF.REPAIR_KEY

This would work but requires a operational key(claim_number) on the comformed dimension(claim), is this correct?

SELECT a.*,b.*
FROM
(
SELECT
CF.*
,C.CLAIM_NUMBER
FROM
dbo.SNA_ACC_CLAIMTRANSACTION_FACT CF
JOIN
dbo.DIM_CLAIM C ON CF.CLAIM_KEY = C.CLAIM_KEY
WHERE CF.CURRENT_FACT_IND = ‘Y’
) a
JOIN
(
SELECT
RF.*
,C.CLAIM_NUMBER
FROM
dbo.SNA_ACC_REPAIR_FACT RF
JOIN
dbo.DIM_CLAIM C ON RF.CLAIM_KEY = C.CLAIM_KEY
JOIN
dbo.DIM_REPAIR R ON R.REPAIR_KEY = RF.REPAIR_KEY
WHERE RF.CURRENT_FACT_IND = ‘Y’
) b ON a.CLAIM_NUMBER = b.CLAIM_NUMBER


Any ideas?
 
OK - If I understand correctly, you have a ClaimTransaction Fact table that is FK'd to a Claim Dimension, and you have a Repair Fact table that is FK'd to a Repair Dimension and to the Claim Dimension. You have a Repair Date in the Repair Fact table that you want to include in the ClaimTransaction Facts. You are trying to join the two Fact tables using the Claim_Key (which I assume is the PK of the Claim Dimension). Am I following so far?

The simple solution seems to be to join the two Fact tables on the Claim_Key field, like:

Code:
SELECT
      *
FROM
      dbo.SNA_ACC_CLAIMTRANSACTION_FACT CF
      JOIN dbo.SNA_ACC_REPAIR_FACT RF
          ON RF.CLAIM_KEY = CF.CLAIM_KEY

But you also said "...the surrogate key in the claim fact may differ from the surrogate key in the repair fact". Does this mean that Claim_Key 123 in the Claim Fact table refers to a different claim than Claim_Key 123 in the Repair Fact table? If so, then obviously the above join won't work. In fact, in this case I can't fathom any way of joining the two, even through the Dimension tables.

If the surrogate keys are generated differently, then I think your only answer is to go back to the ETL process and make one of two changes:

1. Add the Repair Date to the Claim Fact table. This would simplify any queries that you need to do down the road, but may add some complexity to your ETL process. You would also need to think about how to deal with NULL Repair Dates.

2. Generate a new surrogate key that relates the two Fact tables and join on the new key.

Hope this helps a bit....


- Steve
 
Hi again thanks for the help.
I didn't think you could join using the surrogate keys as when data changes in a dimension you'd get new facts with mismatching keys
For example I need the all the payments made for claimId 1001. I make 2 payments on Address A and then 1 payment whilst at address B then I need a repair whilst at address B
I can't relate claim 1001 from the Repair Fact to include all the payments as it'll miss the 1 made at address A.
But DIM_CLAIM is used for both Facts, so I could make seperate joins to the DIM_CLAIM table and join those using the natural key. But is this best practise, or is there a more convential way of joining facts?

DIM_CLAIM
Claim_key(Surrogate Key) ClaimId (Natural Key) Address
1 1001 A
2 1001 B

CLAIM TRANSACTION FACT
Claim_Key Paymemt
1 £40.00
1 £35.00
2 £10.00

REPAIR FACT
Claim_Key
2

Thanks

Doug
 
If I understand you correctly, it looks like you are talking about a slowly changing dimension. You should probably take some time to look into that. I think that will help you understand how to accomplish what you are trying to do.
 
Actually keeping the surrogate key or creating a surrogate key for the change dimensions is part of your ETL architecture. If you are using the Type II dimension and data got changed and if you keep the surrogate key but change only the data with a beg-date and end-date then you do not have a problem with your original plan. Are you creating a new surrogate key whenever the data in dimension changes? Are you using the Type II dimension?

 
Hi, Thanks for the responses it is very much appreciated.
Yes its a Type II slowly changing dimension. I understand the creation and management of the surrogate keys is in the ETL but alot of the ETL architecture is handled by a thirdparty product. If I added a begin and end date wouldn't I lose the historical factor of knowing what payments took place from each address? Its an interesting concept but I'm unsure how it would work, have you an example?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top