Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Table acting as dimension and fact - ugly.

Status
Not open for further replies.

d000hg

Programmer
Jun 2, 2007
6
0
0
GB
Imagine you want to load to a DW something like the number of minutes spent by a person on a project. The obvious interpretation is that the measurable fact is the number of minutes, and the person and the project should be the dimensions.
But what if you also need to load the projects into the DW, as they are added often? Is it still correct to have the project as a dimension when new projects are added, or does this require it to be a fact too? A dependency between fact tables seems like a really bad idea, and I see no reason why new dimension data shouldn't be loaded, but is this considered normal?

Thanks.
 
This is not unlike Order and Order Item. Order Item is the fact table and Order is a dimension, even though orders are often added (if this is unclear, substitute Customer, as new Customers are often acquired). Just because records are being added to a dimension does not make it a fact. There could be facts associated with the number of Orders, Customers, or Projects added, but those Entities themselves are not facts.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
That makes sense, great.

Taking it a bit further, what if you have the concept of a task having several phases? I think again a phase would come under the dimension category. So with a task you might have "not started"/"in progress"/"complete" phases, with all kinds of other information. Obviously, tasks and phases are tightly coupled but such coupling again complicates things. I am wondering if it would be better to have a task_phase dimension instead of the two being separate.
 
Just as orders can move from ordered, to awaiting fulfillment, to shipped, to invoiced, to paid, your porjects can move in different statuses. The change in status can be considered a separate fact table. Event tables are fact tables. So you would have a project dimension and a project evolution (status) fact table.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
OK, so I want to keep my DW schemas as simlpe as possible. But say I have my project dimension. A project has a person as owner/leader. I could just merge person fields into project table, but person is also a dimension. Is it considered bad for there to be relationships between dimensions? How hard should I try to have a strict STAR schema, where the only relationships are 1:1 between a fact table and each of it's dimensions?
 
For your owner/leader of a project you have to ask: does it represent a FACT, and if not (so it's a dimension, how often does it change.
For a project owner/leader I probably would add this attribute to the project dimension.

The relationships between fact and dimension is usually n:1 as a dimension normally has several facts associated with it.

If you have a dimension where you would have several dimension records related to the same fact, you would need a construction that is usually called a bridgetable. for instance: in mortgages you can have several customers (usually spouses). So you would need 2 customer keys in the fact table. Or 3 or 4??
No, you create a bridge table which represents a group of customers. So you need only one key to the bridge table in your fact. The bridge table will have several records ,one for each member of a group, with this same key. The other field in the bridgetable refers to the customer dimension.

Hope this helps.
 
That is all very useful (I have bridging tables elsewhere to consider) but I think I phrased my question badly. The manager of a project is definitely dimensional data not fact data. What I meant to ask is if it is normal for there to be relationships between different dimensions. In this case, a 1:1 relationship between a project dimension row and a person dimension row. If I have my terminology correct, this is no longer a star schema but a snowflake? It seems fairly unavoidable but I would like to check...
 
In my experience, it is not usual to have a defined relation between dimensions (that is, no foreign key constraints on the database). But it does occur from time to time that you can link dimensions (like you described).

The solution is dependend on whether or not you want to use the additional attributes of the owner/manager for reporting. If this is not (or rarely) the case, I would in those rare occasion join the 2 dimensions.
If this will be happening on a regular base, you might want to consider moving the additional attributes from person into the project dimension.

In your case: is there REALLY a 1:1 relation between project and person, or is it 1:n (one person can own/manage multiple projects)?

If you define the relation between project and person, it is no loner a true star-schema. It is indeed a (peculiar shaped) snowflake.
Snowflaking is used in cases of aggregation within a dimension. Is your person dimension a aggregate of project?

JohnHerman: what are your experiences in this?
 
Snowflakes can hurt performance. An alternative would be to merge the info into the person record with nulls (or surrogate keys pointing to N/A) for persons who do not do projects. A similar scenario would be with employee and salesperson tables. All salespeople are employees, but not all employees are salespeople, so it's a Subtype, and you can denormalize the subtype into the master record if performance is lost due to the snowflaking.

I think you need to take a good hard look at the granularity of your fact table. You want to be as granular as possible, and after rereading the thread, it looks like the fact should be time spent on a task, with task, project, phase, person as dimensions.



-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top