Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Table cleaning routine help wanted...

Status
Not open for further replies.

VincentMartin

Programmer
May 17, 2004
6
IE
Hi Folks,

I'm not a SQL guru: my problem is as follows:

I'll be drawing down a table of (let's say customers) from a legacy system. The table contains five free-text address fields, any of which may contain the state name. I have a separate State table, and would like to compare each StateName in the State table with each of the 5 address fields in the Customers table, and then enter the state, if found, into an appended column in the Customers table.

I'm stumped on this, and would appreciate any help,

Thanks in advance,

Vincent
 
update customers
set statename = statetbl.statename
from customers
join states
on customers.address like '%' + states.statename + '%'

But this will cause problems.
What if someone lives in Florida Road?

As you have 5 address fields you could assume always a state and it will be towards the end so

update customers
set statename = statetbl.statename
from customers
join states
on customers.address5 like '%' + states.statename + '%'
where customeers.statename is null

update customers
set statename = statetbl.statename
from customers
join states
on customers.address4 like '%' + states.statename + '%'
where customeers.statename is null

...

But this will probably still give some thong sthat are incorrect but will be better than the previous method.

Unfortunately this sort of thing always causes problems and you will need to define a fixed format for addresses.
In the UK we have PAF issued by the post office which gives fixed address formats (which you can get from the post code) and most systems follow that eventually.

======================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
 
In order to minimize the number of errors, I would start with the most restrictive search. Make those updates and work on the next group by filtering out the ones that you have already done (so you don't reupdate completed ones).
First search: Look for ", "+FullStateName in line 5
Second search: Same as 1 but look for ", "+StateAbrev.
Third search: repeat above two on line 4
After that I'd examine the contents of the rows that haven't been updated. See what pattern there is and design a few more updates.
I've used this sort of technique to clean data. It takes more time but you'll uncover (and presumably fix) tons of original mistakes without making too many of your own mistakes. :)
When the data is really bad, I've written a program to make it easy to update the offending fields.
-Karl
 
As I am currently working with some similar data cleanup, here's a couple of things to take into consideration.

First: Do you have a clean zip code? If you do, (especially if it is separate from the 5 address lines) you can derive the state easily from a lookup table.

Second: If you are trying to clean the ENTIRE address (street #, street, apt #, etc) then I would start looking around at address scrubing apps. I'd suggest one, but I'm still in the process of evaluating.

Hope this helps,
John
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top