Design problem - multiple hierarchies in dimension

Vivarto · Mar 31, 2009

A bit of a tricky design issue that I hope someone can throw some light on

We have dimensions with multiple hierarchies. All hierarchies are set up as level 1 - level 15. In the source data the different hierarchies for a single dimension are held in one table. The hierarchies are differentiated by the use of a 'structure code'. For example in the product dimension the columns in the source table are:

product_code,
structure_code,
level_01_code, level_01_desc,
level_02_code, level_02_desc,
...
level_15_code, level_15_desc

Hierarchies change over time and there is a requirement to track this change. The hierarchies are ragged and do not run from level 01 to level 15. For example the immediate parent of a product 'A' may be in level 8 whereas a product 'B' may have its immediate parent in level 6. In a different hierarchy product 'A' may have an immediate parent in level 7. Having said that I don't think that this would influence the design - but I could be wrong.

We also have dimensional attributes for product that also require change tracking over time such as 'Product Type'

product_code,
product_type
...

The changes in the hierarchies occur independently of the changes in the dimensional attributes.

The problem is how should we structure the database to join a fact table that includes the product dimension to the dimension and hierarchies given the many-to-many relationships between the various tables. We need to have SCD2 type tracking on both the hierarchies and the dimension.

We have come up with three different approaches to this and as many arguments as to the direction we should go in. I have my own preferred solution but I'd be interested to hear what the 'collective brain' thinks the best approach should be

Any thoughts?

Cheers, Tom

johnherman · Apr 9, 2009

First, as you have correctly presented, there are two different aspects to consider, both Type 2. The changes in the product hierarchy are indeed separate from attribute changes for the product code. Therefore, there is a Product dimension.

The other "dimension" can be called Product-Structure or Product-Hierarchy or Product-Family or Product-Category or whatever. Although normalization would call for each of the 15 levels to be children, hooked by bridge tables, and for their descriptions to be stored separate from the hierarchy, in my opinion, this is a matter of deciding whether the extra storage and potential for maintaining changes to those descriptions is worth the performance loss by snowflaking them. Also to be considered is the potential for the business deciding to add a 16th level. If in doubt, build the snowflake version because of the flexibility and throw more indices, CPU, and memory at performance loss.

Another consideration in the solution is the database being used. In particular, Oracle SQL contains a "CONNECT" command which makes queries against hierarchies much easier than DB2, SQL Svr, etc. The solution should definitely consider this aspect as well.

I'll continue to think about this and may have additional insight as others comment and mention aspects which have slipped my mind at the moment.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright

Vivarto · Apr 28, 2009

Hi John,

Thanks for the reply, I've been away for a while so sorry about the delay in getting back to you.

You've come up with pretty much with my preferred solution which is to separate the dimensional aspects from the structural. In my mind the dimension table is the bridge table. A surrogate key is held for each row in the dimension which appears in the fact table(s). The join between the dimension and the fact is performed on the business / natural key qualified by the hierarchy identifier and the current indicator flag. This is not ideal but then any solution will have its problems.

The approach that has been take here is not this. What has actually happened is that all of the dimensional and structural information is held in one table and a bridge table is placed between it and the fact table(s). Two keys from the dimension table are then placed in the bridge table, one the surrogate key for the row and the second the business key. Not all surrogate keys appear in the bridge table, just those for the first hierarchy. When any of the rows in the first hierarchy are modified a new record is created in both the dimension/hierarchy table and the bridge table. The bridge table has three keys

bridge_key
product_hier_key
product_key

It is the bridge key that appears in the fact table(s). To join the bridge table to the dimension/hierarchy table and looking at hierarchy 1 there are two ways this can be done:

From bridge_table A
inner join product_dim_hier B
on A.product_hier_key = B.product_hier_key

(clearly will get multiple dimensional attributes with this query which need to be catered for, adding curr_ind = 'Y' to restrict this will also restrict rows from fact)

OR

From bridge_table A
inner join product_dim_hier B
on A.product_key = B.product_key
and B.hierarchy_code = 'hierarchy1'
and B.curr_indicator = 'Y'

when joining to hierarchy2 there is only one way to join

From bridge_table A
inner join product_dim_hier B
on A.product_key = B.product_key
and B.hierarchy_code = 'hierarchy2'
and B.curr_indicator = 'Y'

Personally I think this is a bit of a dog's breakfast with a complicated ETL processing. Luckily we're throwing Cognos FrameworkManager over the top of this so the end users won't see any of this.

Didn't win this one - such is life...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Design problem - multiple hierarchies in dimension

Vivarto

Programmer

johnherman

MIS

Vivarto

Programmer

Similar threads

Part and Inventory Search

Sponsor