Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Dimensional Relationships

Status
Not open for further replies.

jed1971

MIS
Oct 1, 2010
2
GB
WHat do people recommend as best practise for modelling relationships between dimensions (independent of facts) - thinking of things like customer and geography eg customer x is in Austin TX and that is recorded in the customer dimension but Austin also exists in the geography dimension with other info like state, region etc. You would want to make sure there was integrity between these objects and be able to do useful reporting like say how many customers do I have in Texas in cities above 2m population etc without running the risk that someone had mispelt the attribute value.
You don't want to embed more than the minimal geography info in teh customer record because geography (perhaps not the best example as it doesn't change [much] over time) also would apply to supplier etc so would you put the natural key of the geography dim record (presumably not the SK) into the customer dimension?

Jed
 
Ummm. Numeric keys join faster than character keys. That's one of the reasons we use SK's. I doubt the natural key of the Geography dimension is numeric. But it might be....for instance, zip code.

Also, the use of SK is so that the dimension can slowly change, but previously run queries can still be duplicated. In your example, some other Texas city might surpass 2MM in a future year, but you don't want them to show up in a report run (or re-run) for 2008. And if you do, then that's a different kind of query. It's a "what if" type of query...what if Arlington had 2MM people in 2008? The use of SK's allows historical queries to be duplicated in future times. If a "What if" analysis is desired, then the natural key is used to join to the "What if" table. In this case, it's the same table as the SK, but using the most recent (active) record.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Thanks but I don't think that quite addresses the question and am also not really in agreement about the assertion that SKs are there to support SCD - I would say they are more there to reinforce best practice of not embedding contextual info in the lowest grain logic of the model - ie customer natural keys could come from 1 or n different systems and there can be key conflicts etc. Of course with modern backends like NZ the fact that you have a tidy little int as the SK makes for optimum performance too.

I also don't feel comfortable with teh idea of emedding the SK or one dimension as an affective attribute of another - that seems even worse than creating a factless fact table to model the relationship. I guess what I am wondering is whether it is best to use an outrigger table for what is often 1:n (rather than many-to-many) or if embedding the natural keys is the most practical solution or, if Uncle Ralph has a better suggestion ;)

J
 
Here's a Microsoft link, it applies to SSAS:

It's depicting two scenarios: one snowflake scenario as you have proposed, and one scenario wherein you store the geography key in your fact tables -- so that you are able to measure non-customer related data by geography in addition to your customer-related measures.

In Microsoft's sample AdventureWorksDW2008 database, the surrogate key for geography is stored in the customer dimension, which is also what I would suggest. I'm with John on this -- I really wouldn't want to use something like a Canadian postal code (alpha-numeric) as a primary key for a table.

If this were my project, I would probably opt to store the geography key in the fact tables -- that way you will be able to easily analyze your purchases from suppliers and sales to customers at the same time.
 
Here's some more from a recent column by Susan Harkins in "Tech Republic":

"Relational database theory relies on keys, primary and foreign. Natural keys are based on data, which of course has meaning within the context of the database’s purpose. Natural keys are obsolete now that we have systems that can generate sequential values, known as surrogates. They have little purpose beyond identifying entities. (They are usually an auto-incrementing data type).

The superiority of natural versus surrogate keys is a hotly debated topic. Just bring it up in your favorite development list or forum, sit back, and watch the show. Here’s the nitty-gritty though:

Natural keys can be unwieldy and awkward to maintain. It might take several columns to create a unique key for each record. It’s doable, but do you really want to accommodate that kind of structure if you don’t have to?
Primary keys are supposed to be stable; nothing about data is stable. It changes all the time. In contrast, there’s no reason to ever change a surrogate key. You might delete one, but if you have to change a surrogate key, something’s wrong with your design.
The biggest argument for natural keys is one of association. Proponents insist that you need to be able to associate the key to the actual record. Why? Keys are used by the RDBMS, not users. The other most commonly heard argument is that surrogate keys allow duplicate records. My response is to apply indexes appropriately to avoid duplicate records.

I recommend surrogate keys — always, which is an invitation to hate mail, but it’s my recommendation just the same. I can think of no circumstance where a natural key would be preferable to a surrogate."



-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
My response is to apply indexes appropriately to avoid duplicate records.

Sorry,but this really made my day...

Ties Blom

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top