Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Cleansing

Status
Not open for further replies.
Nov 10, 2000
9
US
Data Cleansing/Scrubbing

I have just been given the task of cleansing an MS SQL Server 7 data warehouse.

Question: Are there any good books, white papers, docs, etc that can help me figure out where to start? Other resources?

I have found info on 'why' you should, but not on 'how'.


Thanks,
Eric
 
A couple of resources....



A couple of tips....

* Run reports showing values and counts for each field.(e.g. Sex: M=10,000 F=9,945 U=1,000 Q=20 N=35)
* Identify the set of allowable values. (e.g. Sex = M,F,U)
* Compare the two and give the field a data quality score. ( 55/21,000 * 100/1 %)
* Rank the fields by importance.
* Identify any important fields that have bad quality scores.
* Track the errors back to the original source and have them fix their system.
* Identify correctable errors (e.g. N is probably a misskey of M)
* Look for linked fields (e.g. number of credit cards and age...we had a few children with high incomes, cars and credit cards...they may be a bit olde than the db thinks!)
My Home -->
 
Vality Technology has been re-engineering data for over 13 years and INTEGRITY is ranked as the industry leader in data quality by The Gartner Group.

Vality's online media kit has white papers, customer profiles and industry specific information. I would point you in the direction of the White paper "The 5 Data Contaminants You will Encounter" This will not tell you how but will give you a roadmap as to what to look out for.

Matthew Dowd
Vality Technology
mdowd@vality.com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top