Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

cleansing data won't work...

Status
Not open for further replies.

sweetleaf

Programmer
Jan 16, 2001
439
CA
My company maintains 3 large tables containing customer addresses. These tables are fed weekly batch data by thousands of partnering companies. None of these 3 tables store history (1 of the tables is a type1 update, the other 2 are full table refreshes).

Now, i'd like to apply address cleansing routines on these 3 tables w/out changing the table structures.
Here are the problems I see. In the case of the table that is type 1 updated ("t1"), the legacy load program applies updates to T1 if there are differences b/w it and the source file. So, if John Smith has the following address today in T1 "15 Maple" and my cleansing routine deems it to be wrong so revises it (still in T1) to "15 Maple Rd" then the next day - as the source file is not a delta but a snapshot of the whole db - when "15 Maple" comes across again for John Smith then it will overwrite the correctly cleansed "15 Maple Rd" value as the values are indeed different. Flipping it back to "15 Maple". So my routine would waste processing power to clean this over and over.
It's pretty much the same in the case of the other tables which are fully refreshed (T2 and T3).

Any help would be greatly appreciated!
 
I guess my question is, how would you(given the above parameters) set up a way to identify truly 'new' addresses and apply DQ just to these?

 
I think you may want to consider adding a flag or two to each field in the record. That field could indicate the degree of confidence, or truth, to the data within that field. This would require you to have varying degrees of faith in the incoming data and be able to differentiate between good data and bad data. Certainly, if you used an address cleansing tool such as D&B, Quality Stage, DataFlux, etc. you could have more confidence in their address than an incoming one. Not sure if this suggested process has any value in your situation.

You could also consider at type-3 solution, where you store the old and current values of each field, and then compare the new (incoming) value to the old and current to see if there is a significant change (other than, similar to your example, the change of Rd to Rd. or Rd to Road).

-------------------------
The trouble with doing something right the first time is that nobody appreciates how difficult it was - Steven Wright
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top