Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Duplicate Elimination

Status
Not open for further replies.

Swi

Programmer
Feb 4, 2002
1,963
0
36
US
Has anyone ever taken a prospect mailing list (for example) and matched it against a master mailing list suppressing duplicates based on address? The addresses are not formatted the same way and I wondered if anyone had any insight into pattern matching or any other information that may help me out. Thanks.

Swi
 
I would also find any information on this type of thing V useful.

Specifically, matching a list of employee names from one spreadsheet against another spreadsheet of employee names. Where they are created by different people and therfore the names can have different spellings (i.e Bob Jones = Robert Jones) or different formats (i.e Bob Jones= Jones, Bob)

Swi: Sorry if this is nowt to do with the above question (i think it is but who am i?)
 
good luck to both of you, there isn't really a great way to do this...some minor tips for starting though, I would consider capitalizing or lower casing everything, so that you can get rid of case descrepencies....for the address thing, you could try setting up a series of replaces based on the USPS list of proper spellings/abbreviations...(IE. suite = STE; ste. = STE; etc...) You may also have some success with breaking things apart into an address 1 and an address 2 by splitting the string on SUITE or BUILDING or APARTMENT...anything that you would typically put on its own line, so that you have just the street address to compare. You may also consider killing extra white spaces and always removing certain punctuation marks so that you are left with as clean of a string as you can get...so, taking all of this, an address may go from:

Address: 123 Fake street suite a

to:

Address 1: 123 FAKE ST
Address 2: STE A

this of course doesn't eliminate other human error, like:

123 Fak street
123 Fakee street
123 Fake strreet
123 Fake street

To a Human, the above may seem like a stupid error that you can handle, but a machine can only say that they are different and pass over them....that is without some insane logic that could handle common issues and map potential mistakes and make intuitive guesses.

I hope this gives you some ideas...

good luck,
Kevin

- "The truth hurts, maybe not as much as jumping on a bicycle with no seat, but it hurts.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top