Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need help with modeling my DW

Status
Not open for further replies.

TimothyMDean

Programmer
Aug 3, 2006
1
US
I'm new to data warehouses and multidimensional data modeling, and I could use some suggestions from people who have done this before.

Some background on the domain: Documents come into our system in large numbers (millions to billions). Documents have some basic metadata associated with them identifying their source, serial number, etc. Automated systems process these documents, creating relationships between them and defining certain attributes of those relationships. Automated systems also flag certain documents for review by human analysts. Human analysts flag documents in certain ways, override automatically-generated attributes of relationships, etc.

The data warehouse we're creating is intended to report various aspects of the automated and manual processes performed on these documents and the relationships between those documents. Some of the reports I need to create include:

[ul]
[li]Show all the documents, including document metadata, that were reviewed within a certain time period[/li]
[li]Show all relationships that have had a particular attribute manually overridden[/li]
[/ul]

Of course, I also have simpler reports that simply sum up the number of relationships created of specific type and the number of documents processed from a particular source location.

My initial (probably naive) analysis suggested that I should have a fact table for documents being processed by the system, a fact table for relationships created between documents, and a fact table for actions that occur on documents and relationships such as flagging or attribute overrides. Each of these fact tables would have a variety of dimensions providing additional info about the facts.

However, I suspect that approach is not viable given that my reports would then need to join across multiple fact tables. All fact tables could potentially grow very large, so I suspect that joins across multiple fact tables would be a bad thing.

I've been trying to think about things differently to see how I might manage things better, but no matter how I break it down I always come down to the same thing: Several tables that seem like fact tables (large row count, representing actions recorded within our business processes), with reports that seem to cross over these multiple fact tables.

Can anyone offer advice on how to best model this kind of situation?

Thanks,

- Tim
 
I think your basic star schema is correct. I would consider building small summary tables (or data marts depending on your DW philosophy) to support most of the common reports which cross fact tables. Ad-hoc activities would still need to join the fact tables, but whenever an ad-hoc report becomes "standard", you would build a summary or mart to support it. The summary could also be a view, materialized view, extract, etc. Close coordination with the user community to validate (before running) and monitor ad-hoc queries will be important for performance reasons. Remember, though, that if you have datamarts (or summary tables) for most of the standard reports, those standard reports will be mostly unaffected by cumbersome ad-hoc queries against the basic star structure.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top