Hi
I'm sure this is something that people have come across before and I would imagine that there must be a range of approaches to dealing with it. I've come up with a potential solution so I'm interested in getting some opinions on the idea.
I have facts coming in at different grains, in this example actual sales come in at the product level and planned sales come in at the product group level. I need to create a fact table of Actual Sales and a table of Planned Sales (I'll create one or more aggregates later). The problem I have is that I need to have a surrogate key on each table for the product dimension. For Actuals this is simply the SKEY of the table dim_product but we don't have a surrogate key for the Product Group.
I think one way to approach this is to add a surrogate key for Product Group. This however would result in a new column in dim_product to hold the product_grp_skey which would have a many-to-many relationship with the fact - clearly not a good idea. To get around this it occurred to me that if I snowflake dim_product so that I have a table for Product Group (say dim_product_grp) then I would have a single row for each product group and a one-to-many relationship from this new 'dimension' table and the fact.
What this does however is introduce snowflaking into a schema that has not previously used it. I can minimise the impact by renaming the dim_product table to something like dim_product_base and then creating a view over dim_product_base joined to dim_product_grp that has the same structure as the original dim_product thus ensuring that all existing code/procedures still work.
I like this approach because it is clean however there may be a better way of joining fact tables to different levels in the same dimension - any thoughts? Anything I've missed in the approach I'm thinking of taking?
I'm sure this is something that people have come across before and I would imagine that there must be a range of approaches to dealing with it. I've come up with a potential solution so I'm interested in getting some opinions on the idea.
I have facts coming in at different grains, in this example actual sales come in at the product level and planned sales come in at the product group level. I need to create a fact table of Actual Sales and a table of Planned Sales (I'll create one or more aggregates later). The problem I have is that I need to have a surrogate key on each table for the product dimension. For Actuals this is simply the SKEY of the table dim_product but we don't have a surrogate key for the Product Group.
I think one way to approach this is to add a surrogate key for Product Group. This however would result in a new column in dim_product to hold the product_grp_skey which would have a many-to-many relationship with the fact - clearly not a good idea. To get around this it occurred to me that if I snowflake dim_product so that I have a table for Product Group (say dim_product_grp) then I would have a single row for each product group and a one-to-many relationship from this new 'dimension' table and the fact.
What this does however is introduce snowflaking into a schema that has not previously used it. I can minimise the impact by renaming the dim_product table to something like dim_product_base and then creating a view over dim_product_base joined to dim_product_grp that has the same structure as the original dim_product thus ensuring that all existing code/procedures still work.
I like this approach because it is clean however there may be a better way of joining fact tables to different levels in the same dimension - any thoughts? Anything I've missed in the approach I'm thinking of taking?