Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Documenting data warehouse

Status
Not open for further replies.

tannu65

Instructor
Feb 9, 2007
1
US
Hi:

I am a technical instructur, and I have been given the task of documenting the datawarehouse. it is on ER modelling and has over 240 tables and each table has two indexes associated with it.

My question is: as a user of the data warehouse - what aspects will be useful in documenting? Like table descritptions? relationship between the tables etc etc?

How can you document a database schema that would be useful and users would find the information useful? the document should not become a shelfpiece.


Any advice will be useful. Thanks in Advance.

Tanu
 
My first thought would be to divide the 240 tables into logical blocks, so you can tell which information goes where. After that, you can describe in more detail how each block is constructed. Here table/attribute descriptions would be nice. And, not to forget, the reason why the datawarehouse was modelled the way it is (that is: if you know or can get it from the people who designed it)

Maybe you can make something like a metadata dictionary. A list of business terms and the way it can be derived from the datawarehouse.
 
Hans63 has a great first step - segregate the tables.
First, there are reference tables. These tables are used by the other tables for lookups. Examples are city, state, province, country, and code tables such as those used for product, sales, customer, or vendor groupings. These are often shared across subject areas.

Next, split out the tables into subject areas. Some tables will cross subject areas. For instance, the Supplier Subject ARea could consist of Vendor, Material, Receipt, Purchase Order, etc. while the Customer Subject Area contains Customer, Product, Order, Invoice, etc. Then there might also be subject areas for Accounting, Inventory, Sales, Marketing, Operations, Manufacturing, etc.

Organizing this way allows the ability to provide a "mini-model" of the DW to selected subject areas or individuals without overwhelming them with the complexity of the 200+ tables.

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Along with the points above, you can use datamodelling tools to create a datamodel out of the database tables. This will further help in understanding the relation between the data in various tables
 
tannu65,

Barring obtaining the rationale and/or design documentation from the original data warehouse architects (there was at least 1 somewhere, right!) as Hans63 suggested, I would recommend constructing a little data profiling information for each column in each table. This can be automated by querying the database metadata tables or using one or more professional tools designed to do the job such as Informatica, DataStage, etc..., but I'm guessing if you had access to those tools and the people trained in using them, you wouldn't be posting here ;o) SO... Let me make a few assumptions and go from there:
With 240 tables in the data warehouse, this MAY not even be a data warehouse, OR it is one of the very LARGE ERP monstrosities (SAP, LAWSON, etc...) OR it is an Inmon-esque 3NF enterprise data warehouse that somehow made itself manifest. In any case if you can lay your hands on the ETL (Extract-Transform-Load) code that feeds the data warehouse, this may provide some good metadata from which you can extrapolate useful information. In particular, I would seek out what is known as the "data mapping" documentation. This describes which source data is used to populate which data in the data warehouse. Get this published to the business community in as palatable (business-like) nomenclature as possible to confirm you are on the right track. NEXT, rent yourself a good ETL architect who can generate the SQL necessary to produce a "poor man's" data profiling repository for all of your tables. This will contain at a minimum the following information: Table name; Column name; a column description if populated in the database; data type: date, number, string will be concise enough; min/max column values, min/max column lengths for string data; total number of rows in the table, total number of null or missing values for each column. Have the ETL architect also produce a list of any/all primary keys for each table and any unique check constraints, since these will typically represent highly relevant join points in 3NF models, from which relationships will fall out as suggested elsewhere; Get all of this stuff together in a metadata repository and then analyze the results to derive secondary and tertiary facts of importance about the data warehouse that will allow you to answer important questions: 1. Is the EDW being loaded successfully when it should be? 2. Is the data in the EDW of high enough quality, accuracy and completeness to use for making "bet your biz" decisions? 3. Is the EDW data integrity (relationships between parent-child tables) in tact?
4. Given current growth factors (past 3 months of loads?), do you have enough capacity to continue for the next 18 months?

Let me suggest some information that is ultimately useful to a savvy SQL power user:
1. List of tables, with a best guess of the contents of the table. Start with the fact tables first...this will be where the additive numeric values are found.
2. Next, document the tables with foreign keys tied to these fact tables; these will be your next level of pertinent business information / reference.
3. Finally, document the tables with foreign key references found in the first round of reference tables;
4. At this point construct a visual model of the database and present it to your business users; They "should" be able to identify with the relationships if the EDW was properly modeled.

I hope this helps get you started.

Best of luck,
Don McMunn
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top