Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Attributes that don't relate to all of the dimension records.

Status
Not open for further replies.

kiwilamb

Technical User
Nov 17, 2005
4
CA
Hi there

I have a customer dimension table and I have attributes that are only relevant to some of the customers. If I included these attributes in the table it would mean that I have null (or a key that equals null) for many of the records. I may have a few of these types of attributes therefore possibly having many null values for each attribute added.

I wasn't sure if I should include these types of attributes in my main dimension table and have many fields with a null value or create a separate attribute table with a structure of customer_key, attribute_type, attribute_value where there may be many records for each product. This second option would obviously create a snow flake schema.

Thanks
 
To keep your model simple, it is desirable to maintain a single dimension with logical null attribute (by logical null, I mean to populate a value like -1 that notifies a not available/applicable). However, your design decision will also depend on the number of records you anticipate in the dimension (i.e. how many customers are you talking about), the anticipated growth of the dimension over time, and finally if it is a type two dimension that captures history. If the dimension classifies as a large dimension with potentially millions of records that grow at a comparable rate over time, and if a majority of the customers will have logically null attributes, then a snowflake might be justified. You have to weigh the tradeoff of space justification over design simplicity in your decision. Hope this helps.

Amol
Infonitive Inc.
Data Warehousing & Business Intelligence Strategies
 
If you really have many records with a lot of (different) null values, you might want to reconsider using one dimension as it may be possible that you inadvertently combined attributes which really are 2 dimensions.

I disagree with infonitive on the issue of snowflaking when having a really big dimension (lots of records). I would first research the option of moving most of the null-valued attributes to a separate dimension. This is different from snowflaking (imo).
 
Thanks for the replies based on the two answers I need to do some experimenting to see what wil offer us the best solution.

We aren't capturing history with this dimension but it is a slowly changing dimension, we have approximately 900k records in this table and adding 50k or so a year. Currently have 37 fields in this table.

One particular attribute that I'm having trouble with is if a customer has purchased a particular product or not. I would then want to group customer by whether they have bought this or not and see total sales of all products etc for each group so it can't be a simple filter.
I wasn't sure if this type of attribute should exist in the database or in the reporting product. We are about to purchase a product so I'm not sure what the reporting tool will offer.
 
kiwilamb said:
One particular attribute that I'm having trouble with is if a customer has purchased a particular product or not. I would then want to group customer by whether they have bought this or not and see total sales of all products etc for each group so it can't be a simple filter.
I wasn't sure if this type of attribute should exist in the database or in the reporting product. We are about to purchase a product so I'm not sure what the reporting tool will offer.

I'm not sure what you mean here. Take the example of a cable tv company. Many of them also have VOIP service. If you had an attribute called VOIP Customer with possible values of VOIP Customer and Non-VOIP Customer, then I don't see a problem with reporting on those groups.
 
Sorry about that, but this type of product would only ever be bought by a subset of the customers (different business)so the possibilities would be Yes, No, n/a, I would want to differentiate the n/a and the no's. Which I could do by Customer type.
 
I see exactly what you are saying now. I had a similar situation before, but it was in a dimension more related to Product than Customer. In that instance, I had a few possible values it could be for the records to which it applied, and then a lot of dimension records where it did not apply.

Whether or not the attributes applied were also based on a higher-level "type." What I ended up doing in this situation was to carry over information from the type name instead of using a value of "n/a." This was to give us a visual cue that this was from a different type. However it was never really an issue because the type would always be filtered anyways or the queries would include the type.
 
The question you have here is to find something that did NOT happen (if I understand it right). This has been adressed by kimball in the datawarehouse toolkit. I do not have it available right here, but I think it is called factless fact.

I don't exactly know how this was solved, sorry. Anybody else?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top