Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Storing Null Facts

Status
Not open for further replies.

eshippey

IS-IT--Management
Jul 18, 2003
1
US
We're migrating one of datasets to a new warehouse. In our data, I have a fact table with 350,000 rows in it. The new warehouse designer has migrated the data and now we have 12.5 million rows in his new fact table. (he made sure that every dimension (hospitalid, measureid, reportingperiodid) were loaded from our lookup tables. So, 12.3 million of the 12.5 million rows have nulls in all fact columns.
We're trying to run benchmarks and percentiles against the whole fact table and its taking 3-4 minutes to run, whereas on our old system it only takes about 20 seconds to run.
He's pretty adamant about not changing this model.
I don't understand his reasoning, could someone enlighten me as to why loading every possible element is efficient.
 
This does not make sense if the database behind the warehouse is a Relational database (Oracle, DB2, SQL Server). I am not sure if that technique adds value or performance if the DW database is a Multidimensional DB like Essbase, Cache.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
A Reason, that I can think of, for loading so many NULL values may be a set of queries about the so called non-events. Like how many (and which) hospitals did not have a certain measure in a certain period.

Many design decisions about a datawarehouse are driven by the business needs you want to address with it. So although I did raise my eyebrows when reading this, it may be a valid design. The designer should be able to (HAS TO) explain why he made this design.

Furthermore:
Loading every possible element is not efficient (at least not when you have a density of about 2.8%.) It may be efficient if you have a much higher density, say > 11 million rows with data against a 12.5 million row carthesian product.


If all reportingperiods for the upcoming years are present, than it will require updating the fact table, which is something you want to avoid.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top