Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

DW QA testing

Status
Not open for further replies.

MsHart2u

Programmer
Jul 16, 2003
30
US
Hi,

We hve been asked to research and advise on the best QA practices for Data Warehouses, with the main focus on the number of years of data commonly used when testing. We are using MS SQL Server.

If anyone can help or advise on a formula, etc to help with ourdecisions or some websites offering suggestions, then we would appreciate any insight.

Thank you in advance.


[blue]Kathy,
Bus Sys Analyst[/blue]
"I am always doing that which I can not do, in order that I may learn how to do it."– Pablo Picasso
The person who says it can't be done shouldn't interrupt the person who is doing it.—Chinese Proverb
 
The load process that I have developed does QA by doing very elementary data-checking:

1) Data-type mismatch: For each field, it verifies that the field-value can fit into the data-type. For example, an entire input record would be rejected, if the length of a field-value is longer than the declared length or if a character field-value is mapped to a numeric declaration.

2) Field-Value limits: A field value must be within certain limits. For example, course marks can only be between 0 and 100. Anything outside that range is invalid. Sometimes limits are dependent on the status of other fields. For example, marks may be NULL if the course has not concluded yet.

3) Parent/Child Relationship between records: A course code referenced in one table must be listed with its descriptive information in another table. Tables that make reference to missing parent records are problematic. My load process will create a dummy record in the parent so that queries do not loose any records when doing INNER JOIN operations. The dummy records are usually overwritten when the real data arrives. Often referential integrity problems are caused by bad data in the source system, buggy ETL conditions to extract all the necessary data, or timing issues that resolve themselves by the next ETL run.

Once all errors have been resolved from these 3 types of checks, further QA can be accomplished by aggregating the data and looking at trends. For example, if sales are $100,000 in year 1 and $1,000 in year 2, and $110,000 in year 3, that would be an indicator that much of the data of year 2 was not picked up in the ETL.

In most cases data violations generate warnings with the record being accepted. Rejecting records is not a good idea because it means that you loose other important information unrelated to the field causing the error.


I hope this helps for starters
 
As there are generally no user interfaces for Data Warehouses, the main activity is Validation. Record and row counts, totals for fact variables and aggregates, alone or across attributes, ALL should match the source system. Cubes should match source tables. Summary tables should match their sources.

Some DW implementations use tables in the DW which store the valdiation values and verification results. And others have batch processes to search for errant records, maybe those which are using invalid domain values or nulls, etc.

The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
The number of years you need to use depends on various factors, including the "stability" of your transaction systems. For instance, if you had a merger with another company in 2004, I would certainly include some of the merged data. If you completely revamped your salespeople and territories in 2002, then strong consideration should be given to using some of that data. Or maybe not. It depends on the requirements of the user community and long term strategy. For instance, in a quickly evolving market, like tablets, Apple might not be interested in sales of their Newton product from a decade ago.

The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top