Managing Granularity

ClevelandSteve · May 26, 2010

I have a data warehouse that stores sales data sent to us by our customers. Managing the time/date dimension has become a challenge and I'm looking for some advice. Some customers will send daily sales, most send sales by week, and two send sales by month. How do I store these different grains in the data warehouse so that products can be analyzed across customers without losing the lowest grain per customer if it is ever needed? Do I use different fact tables for the different grains? Multiple time/date dimension tables? Thanks.

RiverGuy · May 26, 2010

For the monthly datasets, load them with either the first or last date of the month. Do the same thing for the week. I will hesitate to split up the data into multiple fact tables when you have the same subject matter data.

ClevelandSteve · May 27, 2010

That’s actually how I have it set up now. I’ve always viewed that solution as shoehorning the data into a model that doesn’t quite fit. The problem with it is if the analyst doesn’t filter out the customers who send their data aggregated at the higher grain, they see large spikes on the dates where I shoehorned it. My other thought was to just leave the date field null for the customers who use the week or month grain. At least that way the higher grain sales get grouped together in a null bucket and don’t throw off the daily numbers. I just thought someone may have a better idea.

MDXer · May 27, 2010

I am one who would place the various data into sperate fact tables which are at the garin that is common among that data. As you have seen if you lump them into a first or last day then you will see data spikes. A fact table and the records that are loaded into it are defined by a set grain. in your case the problem is in the grain of the date. If the records don't fit the grain then you adjust the fact table to the common grain and handle your facts at this level, but mixing grains in a single fact table gets tricky and can lead to false reporting in the data.

ClevelandSteve · May 27, 2010

Thanks for your input MDX. I agree with your statement and was considering that type of solution in my first post. My hesitation is that I would need to create views with all the fact tables of some sort so my users can perform cross customer analysis. Once I aggregate the low grains in my views the keys to the date dimension table are broken so I then must deal with that somehow. It just seems like I’m getting into a maintenance nightmare with this approach.

RiverGuy · May 27, 2010

Then it's a business decision: live with the spikes and generally do not run reports at the day grain for your non-daily-dataset customers, or separate the fact tables and struggle to report across all customers simultaneously.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Managing Granularity

ClevelandSteve

Programmer

RiverGuy

Programmer

ClevelandSteve

Programmer

MDXer

Technical User

ClevelandSteve

Programmer

RiverGuy

Programmer

Similar threads

Part and Inventory Search

Sponsor