Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Putting dimension attributes in fact tables 1

Status
Not open for further replies.

Goodie666

Programmer
Nov 24, 2008
44
CA
Hello,

This is my first post here so hi everyone :)

I have a question regarding dimensional modeling. I've got myself two of Kimball's books (The DW Toolkit and the SQL 2005 version of it) and I'm trying to analyze some requirements I have to come up with a simple proof-of-concept dimensional model.

I have a dimension table 'equipment' that contains attributes such as name etc. but the equipment has some attributes that change with time. one of them is 'status'. My fact table, 'transaction' contains its attributes. One requirement of the mockup reports I've been seeing is to see the 'status' of an equipment at the time the 'transaction' was made. My question then is how bad it is to put the 'status' in the transaction fact table? It's not an attribute of the fact but I do want to have it at the time the fact is logged. I just don't want to have this information in my dimension since it's bound to change all the time...

Any suggestions are welcome, even if just to reassure me that it's not as bad as I think it is.

Thanks,

Greg
 
Hi Greg,

In dimensional modelling, as Kimball points out, it is usual for dimensions to change (slowly) over time. What you are adressing here however is not so much as a dimensional attribute but some kind of fact.
There are several ways to handle this attribute. One is to build a separate fact table that holds all statusses as columns and holds the date for each status. Date may be first time the status was reached, or the last time.
Another option as you pointed out is to create a status dimension and to add this dimension to a transaction fact.
And of course you can opt to keep 1 or 2 status fileds in the dimension (may by usefull to get the actual status).

Which solution is best suited for your situation depends on the information needs. Some questions can be better answered by the transaction fact table with status dimension, some can be better answered by the accumulating snapshot. And sometimes you have a diverse information need, and you need both fact tables.

So to answer your specific question: It is not bad per se to put a status attribute in a fact table (but you should treat it like a dimension).

Succes
Hans
 
Hi,

Thanks for the information, I think I understand better. I come from a relational background and my head tends to make alarm sounds at denormalization of attributes but from a conceptual point of view, it's really about the equipment's status when the fact is logged so I'm really leaning toward the added dimension for the fact table.

Thanks,

Greg
 
A good point for education here. There are fact tables (with metrics and other measures) and there are "factless" fact tables, a name I detest using, but one of the DW pundits continues to use it. I call them "event" tables. The changes in status are "events" and there should be a fact table record for each event. If the status is so unrelated to the rest of the facts, make 2 fact tables with conforming dimensions.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Hi Johnherman,

It's a nice point you have here.
what would be your term to properly identify 'factless facts'?

Hans
 
'factless fact' covers the type better than 'events' in my opinion. A change in status may be considered an 'event', but what about a student signing up for a number of classes? 'Event' is too limited in the scope of possible relationships as it induces a supposed time-component..

Ties Blom

 
I would argue that a student signing up for classes is an event. And it could occur more than once. Event does not imply time as in Mon, Day, Yr. It could be a status or lifecycle type field.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Perhaps I'm being too academic here, but I translate it as taking place in time. Perhaps the more neutral 'transaction' would apply?
The benefit of the 'factless fact' moniker is that it possitively indicates that no measure exists in the table as opposed to a 'real' fact.
When defining 'fact' as an object to perform calculations with, the factless fact would cover those tables that store non-facts and no dimensional data..

Ties Blom

 
Hmmm, Factless facts and Events.

An event is something that happens at some time. Which is just what we try to capture in transaction facttables. In that sense An event-fact would be equal (or at least similar) to transaction fact.
You can argue whether events CAN have measures. The event would be that I bought a MP3-player. Is the price I paid relevant to this event? I would think so, but I can probably come up with arguments why it would not.

Difficulty appears to be how to name a fact that has no measures. (which is something different from non-facts!)
Personally I feel that factless fact comes very close to drescribing that situation. But that me be due to the fact that I am familiar with the term.
Measureless facts could be another term, but it somehow doesn't sound entirely right.

Non-facts or non-events are those events that did NOT take place. e.g. What article did not sell during a given period in time.
 
Factless Fact is about as meaningless a description as you can get. Like a measureless measure. I'll stick with my verbiage and you can have yours. :)

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
This discussion took on a life of its own!

I would like to try my hand at answering the original question about the changing statuses in a dimension that need to be recorded. It seems that Greg is reluctant to use the status on the "equipment" dimension as a type 2 change. I think this would solve his problem but perhaps it's an issue (perhaps the status changes are very frequent and could cause the size of the dimension to blow out). Then I would suggest that he creates a static dimension that is separate from the "equipment" dimension that holds all possible statuses along with any additional info that may be relevant to the status. This dimension is then linked into the fact and so the status at the time that the transaction occurs is known. The remainder of the "equipment" dimension info is known from its own link to the fact table. The date of the transaction should also be known based on the link to a separate date dimension.

In this way all information is known and the dimension complexity is kept to a minimum.
 
Hello bootsminimus,

thanks for the reply. I think that what you are describing is actually what I am going for. Since I need the equipment's status information at the time the transaction happens, it'll be done that way. Thing is, i also think I might have two fact tables for that process: one to treat it as a pipeline that evolves with time going through several stages and one that is factless but describes that evolution. The reason I would do two facts is that I don't want to 'spoil' one because of what I want to achieve with the other. This way I can get the whole snapshot of the transaction while also having access to all its details.

Greg
 
Hi Greg,

I may have misunderstood the requirements and I'm not certain of the contents of your 'transaction' fact table. From your last entry it does seem that you need to be able to view the dates that the status of the 'equipment' dimension changes and the 'transaction' fact doesn't (easily) show you this. If that is true then I guess you do need two fact tables since you have two distinct requirements.
1. The measures related to the 'transaction' fact.
2. Dates that the status of the 'equipment' changes.

Cheers,
Nigel
 
Hi Nigel,

I realized that only recently I needed to have these two facts separated. It's part of the same pipeline in the operational system but they really are (would be) two different processes in the warehouse.

Thanks,

Greg
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top