Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Joining facts to multiple levels in the same dimension 2

Status
Not open for further replies.

Vivarto

Programmer
Nov 27, 2005
43
AU
Hi

I'm sure this is something that people have come across before and I would imagine that there must be a range of approaches to dealing with it. I've come up with a potential solution so I'm interested in getting some opinions on the idea.

I have facts coming in at different grains, in this example actual sales come in at the product level and planned sales come in at the product group level. I need to create a fact table of Actual Sales and a table of Planned Sales (I'll create one or more aggregates later). The problem I have is that I need to have a surrogate key on each table for the product dimension. For Actuals this is simply the SKEY of the table dim_product but we don't have a surrogate key for the Product Group.

I think one way to approach this is to add a surrogate key for Product Group. This however would result in a new column in dim_product to hold the product_grp_skey which would have a many-to-many relationship with the fact - clearly not a good idea. To get around this it occurred to me that if I snowflake dim_product so that I have a table for Product Group (say dim_product_grp) then I would have a single row for each product group and a one-to-many relationship from this new 'dimension' table and the fact.

What this does however is introduce snowflaking into a schema that has not previously used it. I can minimise the impact by renaming the dim_product table to something like dim_product_base and then creating a view over dim_product_base joined to dim_product_grp that has the same structure as the original dim_product thus ensuring that all existing code/procedures still work.

I like this approach because it is clean however there may be a better way of joining fact tables to different levels in the same dimension - any thoughts? Anything I've missed in the approach I'm thinking of taking?

 
One method is to make use of unknown/not applicable attribute values. Your Planned Sales table would tie to the "NO PRODUCT" records in this example. It keeps it a pure star as well.

Code:
DimProduct

ProductSKey   ProductName   ProductGroup
-----------   -----------   ------------
1             NO PRODUCT    Tires
2             Radials       Tires
3             White-Walls   Tires
4             NO PRODUCT    Wheels
5             Chrome Wheels Wheels
6             Silver Wheels Wheels
 
I would create product_group_dim table and then conform it to the product dimension.

Anand
 
My approach would be to use a group SK that is unique to the group level. In fact this is standard practice for me, ensuring that members at each level are distinct.

Example
Code:
Product_SK   Product_Name   ProductGrp_SK   ProductGrp_Name
----------   ------------   -------------   ---------------
1            Vizio 37"      1               LCD TV
2            Vizio 42"      1               LCD TV
3            VIzio 47"      2               Plasma
4            Play Station   3               Game Console
5            XBox 360       3               Game Console
6            WII            3               Game Console

Now when you create your planned sales fact you can key to the ProductGrp_SK. Depending on what technologies you use the and their abilities to present data to the user the fact that you aren't using the leaf level of the dimension as the leaf of your planned sales should not be an issue.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
I agree with MDX'er. Denormalize the snowflake attributes into the base dimension and create artificial keys for NULL and N/A situations in the data.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
MDXer,
I'm interested in your solution. I do have one question though. If planned sales exist for a new product group which has not had any actual sales and does not have a product list broken out yet, how would the ProductGrp_SK be handled in this case?
 
As I see it you could approach this one of three ways.

1) As John mentioned you have your default buckets for Null and N/A, products that for some reason are unable to key up during your ETL. Expand this to a group of "Future Product" and you can key these fact to this group. Now this leaves open the issue of what if there are multiple future products and you want them broken out seperately which this solution will not allow.

2)Add the new Product groups to the dimension with the a child of Unknown.

Code:
Product_SK   Product_Name   ProductGrp_SK   ProductGrp_Name
----------   ------------   -------------   ---------------
-1           N/A            -1              N/A
1            Vizio 37"      1               LCD TV
2            Vizio 42"      1               LCD TV
3            VIzio 47"      2               Plasma
4            Play Station   3               Game Console
5            XBox 360       3               Game Console
6            WII            3               Game Console
7            Not Specified  4               DVD Player
8            Notspecified   5               Stereo

3) Probably the hardest to implement but the easiest to maintain. Inferred Members or Early Arriving Facts. In this you place an Transformation in your ETL that creates a dimensional record to get an SK for the fact record, Typically only the business key is known but yuo can place defaults for other attributes, or populate attributes based on the data provided. This allows the reords to progress through the ETL as known products but requires a process or person to then provide the correct attribute data.

Option 2 and 3 would result in the same presentation with the difference being that in Option 2 the data needs to be provided prior to any fact records for the product groups being generated and option 3 allows you to address the issues before or after the fact.


Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Firstly, thanks for the responses, very encouraging.

I have a problem with MDXer's solution.

You say: 'My approach would be to use a group SK that is unique to the group level. In fact this is standard practice for me, ensuring that members at each level are distinct.'

I don't see how the members at the group level are distinct. In the screen shot I can see three rows for 'Game Console'. If I were to join a fact table at the product group level of grain to dim_product on product_grp_skey then each fact row for 'Game Console' would be joined to three rows in dim_product i.e. a many-to-many join between dim_product the fact table. Aggregating the measures would then triple count would they not?

I'm not sure what's meant by: 'Depending on what technologies you use the and their abilities to present data to the user the fact that you aren't using the leaf level of the dimension as the leaf of your planned sales should not be an issue.' - should I read this as handle the triple counting in the end-user tool?

Am I correct in thinking that there would be double/triple/etc counting? Clearly everyone sees this as a workable solution so what am I missing here?

Confused...

 
The group level is distinct in that each group member has a distinct Group_SK. You don't see game console with three seperate Sks you see it with a single sk. This means that when you place your records in a hierarchy they appear once for each key.

By technologies I am reffering to any applications that you may utilize to build Cubes or present the data with. While all tools preform pretty much the same they all do things a bit different. For example in SSAS you need not expose the leaf level to the user as long as your Attribute relationships are set up correctly. By trying to implement a many to many relationship in a tool might cause double or triplicate values if the tool doesn't really support these relationships. SSAS supports many to many relationships without the use of aggregate tables, while Cognos last I used required aggregate tables to work correctly.

There would be no double and triple counting it would be the same as doing:

Code:
Select
ProductGrp_Name
, SUM(Quantity) As Quantity
, Sum(Price) as Price
From fact_PlannedSales fct
     JOIN dim_Product dim ON fct.ProductGrp_SK = dim.ProductGrp_SK
Where dim.ProductGrp_Name = 'Stereo'

This SQL query wouldn't include the DVD amounts in with the Stereo amounts unless DVD and Stereo both had the same GRP Key.


Hope this clarifies things a bit.

Paul
---------------------------------------
Shoot Me! Shoot Me NOW!!!
- Daffy Duck
 
Paul,

Thanks again for taking the time to respond. I'm still confused (or just a bit slow!)

I understand the structure you're suggesting, each product grp has one-and-only-one skey. In your example you used:

Code:
Product_SK   Product_Name   ProductGrp_SK   ProductGrp_Name
----------   ------------   -------------   ---------------
1            Vizio 37"      1               LCD TV
2            Vizio 42"      1               LCD TV
3            VIzio 47"      2               Plasma
4            Play Station   3               Game Console
5            XBox 360       3               Game Console
6            WII            3               Game Console

Here ProductGrp_SK = 3 for the Game Console product group. If I have a single fact record for 'Game Console' with a quantity value of 10 then joining this one fact record to the three matching records in the dim_product table would result in:

Code:
Select 
fct.ProductGrp_SK, 
dim.ProductGrp_SK, 
dim.ProductGrp_Name, 
fct.sales_amt
From fact_PlannedSales fct
     JOIN dim_Product dim ON fct.ProductGrp_SK = dim.ProductGrp_SK
Where dim.ProductGrp_Name = 'Game Console'


fct.ProductGrp_SK dim.ProductGrp_SK dim.ProductGrp_Name fct.Quantity
----------------- ----------------- ------------------- -------------
3                                 3 Game Console        10   
3                                 3 Game Console        10
3                                 3 Game Console        10

Clearly, aggregating this to the product group level causes triple counting:

Code:
Select 
fct.ProductGrp_SK, 
dim.ProductGrp_SK, 
dim.ProductGrp_Name, 
sum(fct.sales_amt) as qty
From fact_PlannedSales fct
     JOIN dim_Product dim ON fct.ProductGrp_SK = dim.ProductGrp_SK
Where dim.ProductGrp_Name = 'Game Console'
Group by fct.ProductGrp_SK, dim.ProductGrp_SK, dim.ProductGrp_Name


fct.ProductGrp_SK dim.ProductGrp_SK dim.ProductGrp_Name fct.Quantity
----------------- ----------------- ------------------- -------------
3                                 3 Game Console        30

I have the feeling I'm missing something here because there is a many-to-many between the fact and dimension but I don't see how you're catering for it.

My reading of Kimbal et al is that each fact table should only hold records at single grain and that each dimension key should be joined to a dimension table of the same grain. It is for this reason that I lean towards having a dim_product_grp table that only has a single row for each product grp - thus avoiding the possibility of a many-to-many relationship.

What am I missing?

Cheers, Tom





 
Tom,

you are right. The solution Paul (MDXer) suggested is a good one, but you have to be aware of possible double-countings. One way to solve this, is to create a view on the product dimension which would present only the product group attributes. Here you would use a distinct clause. And some DBMS's provide very efficient ways of handling views, by secretly storing it somewhere and updating it if the source table contents are changed.

Or you would have a really separate dimension for the product group level. This would depend on the size of the dimension.

Once you haver the queries to obtain the actual and the planned sale, you can combine them on the product-group-sk.

As to the fact table holding exactly 1 grain: You are using 2 fact tables with a different grain on the product-dimension. So conceptually you would need 2 dimensions. The choice on implementing 1 or 2 dimensions is a typical design question, where you have to have a good and complete picture of the situation.

Hope this helps.
Hans
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top