Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

etl testing tools 1

Status
Not open for further replies.

Guest_imported

New member
Jan 1, 1970
0
0
0
hi,
i'm in a project where i developed some ETL pakages. i want any kind of tool to test whether my developed pakage is running correctly or not. pls help me out in finding the tools for testing ETL pakages.

Thanx in advance.
 
I want to know the Built-in tools avaliable in DB2 for ETL,OLAP and Query & Reporting
and also the third party tools for the above ones.
 
You can test an ETL tool with a business rules engine. Just build a rule that asserts that the data found in the source will be present in the target with any transformations. You run the ETL job, then you run the test to verify that everything worked.

Greg Leman
Metagenix, Inc.
 
Testing our ETL package is like testing any software. You create a controlled input and the expected output. then you run the ETL for the controlled input and find the divergence between actual output and expected output.

There are third party testing software. but if the package is in PL/SQL the the traditional route mentioned above should be sufficient.
 
The problem with that approach is that when you're done, all you've proven is that your ETL jobs handles the input you've put together the way you expect it to. With ETL, it's the boundary cases that often cause problems. No matter how well you design your tests, you'll always be missing something when it comes to a situation as complex as ETL.

The redundant programming approach has a much better chance of catching the boundary cases. Obviously, one should test as you've laid out, but in something as mission critical as ETL it's a good idea to wear both a belt and suspenders.

A 1-2% error rate in most software applications is ok -- just consider how much poor quality data comes from the operational systems you work with. But in a data warehouse load, 1-2% loss (especially in the boundary cases) could skew the results significantly, essentially nullifying the entire investment. Worse yet, the business people that rely upon the DW to make decisions won't trust it after they find out that the transfer process doesn't work as expected. What they're willing to put up with inside of their own operational systems is vastly different when it comes to the DW.

I saw a situation as a consultant where a Fortune 10 company had a DW load that took place every day from extract files from a mainframe. They had tested their ETL process (a set of loader scripts) by hand, and had a series of scripts that they ran to massage the data in the staging area before they loaded it into the DW. To make a long story short, one employee got sick while another was on vacation, and the files in the staging area weren't being refreshed properly. They loaded Tuesday's data five times, and didn't find out about it for six months until a C-level manager couldn't get some reports to add up correctly. A simple pre/post process check would have saved their entire department a lot of heartache. Instead, their group and their DW project obtained a reputation for innaccurate results. BTW, that group got cut at the next round of layoffs.

Greg Leman
Metagenix, Inc.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top