Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fixing terrible data in a mailing list 3

Status
Not open for further replies.

SoClueless

Technical User
Nov 28, 2000
49
US
Hello again ....

We have a problem.... and since I am most familiar with access in our organization (have not TOUCHED Access in over 2 years)I get to work on it. By the end of the week our organization would like to have a mailing list ready. The data is in EXCEL. The name column has names in different formats, ie John Doe; Doe, John; and there's even a few Mr. John Doe. My mission is to eliminate duplicate name entries and also to see that only one mailing is sent per household. So, if there is a John Doe and Doe, Jane living at the same address, they should get one postcard for "Doe Family". I'm feeling like this is impossible. Almost as bad as eating worms! Any takers for this one? Assume that I'm basically illiterate in the code department. Thanks if you can help.

S.C.
 



Hi,

With any data cleanup task, it will be a combination of code and sweat.

Start by sorting and grouping by address. See where there may be address DUPLICATES before you go trying to find name duplicates.

How many rows in Excel?

Skip,
[sub]
[glasses] [red][/red]
[tongue][/sub]
 
Hi Skip,

Either 16,448 rows or 19,448. aarrgh, I don't even have them all, they have just given me a few rows to figure out what to do!

S.C.
 
first of all, you're going to want to get ALL the data. Cleaning up bad mailing list data is one of the hardest things to do and you're only going to want to do it ONCE!

Unfortunately, it's not something that can be fully automated because there are so many variations. A procedure can't tell that these are all the same address:

123 First St.
123 1st St.
123 1 St.

Only someone looking at the data can tell they're the same.

Additionally, misspelling can't be easily caught and EVERYONE misspells Albuquerque.....(can you tell I've cleaned up mailing lists before???)

I would sort the excel list by city & state first, then by address and then by name. Fix any misspelled cities and states. Sort again. Depending on the data, you may be able to run some queries against the spread sheet and see what needs to be fixed.

SELECT DISTINCT CITY, STATE FROM spreadsheet

will give you a list of all the cities and states. You could easily search that for anything blatently incorrect.

Unfortunately, you're not going to be able to get a quick fix. It's a lot of work.

Leslie

Anything worth doing is a lot more difficult than it's worth - Unknown Induhvidual

Essential reading for anyone working with databases:
The Fundamentals of Relational Database Design
Understanding SQL Joi
 
You don't happen to have any address standardization software in house do you? That could help tremendously.

You are going to need a fairly serious function for this if the goal is going to automate it. Otherwise, follow Les' advice. For someone who doesn't use access frequently to come up with a working (and validated) function for this is going to take quite a bit of time.

I wish I could offer more, but I mostly do this kind of work in SQL server.

Good Luck,

Alex



Professor: But what about your superintelligence?
Gunther: When I had that there was too much pressure to use it. All I want out of life is to be a monkey of moderate intelligence who wears a suit. That's why I've decided to transfer to Business School.
Professor: NOOOOOOOOOOOO.
 
Thank you, Lesley and Alex. I'm encouraged to keep at it .... it will be a miracle if we can do this by the end of the week! ... S.C.
 
You may also find it useful to break down the street part of the address into 'words'. If you have a table of ID and Word, it will be easier to change all the 'Aves.' 'Avs' 'Aevs,' etc to Avenue and so on.
 
I have just remembered that you may find a table of locations (states, counties, whatever) useful. You can then matched against address lines and mark lines that match as correct.

PS It should be possible to do this in a week, but your sanity may suffer. [dazed]
 



to continue on Remou's suggestion, consider establishing conventions for street addresses. But beware of doing global Edit/Replace changes, for instance, from Ave to Avenue or you might end up with Avenuenue

Skip,
[sub]
[glasses] [red][/red]
[tongue][/sub]
 
So, everyone.....

We are working on fixing the addresses so they are in a uniform format and will eliminate the name problem by using "library patron" in the name field for all postcards.

Once I get the excel sheet, assuming all the addresses are done correctly, can you help me with the steps to take to elminate duplicate addresses? I remember vaguely something about grouping and select distinct, but not exactly what to do with them.

Thanks again for hanging in there for me .....

S.C. :) [ponder]

 
Is this a continuation of your post:
Select Distinct for mailing list?
thread701-1290023
?
 
Yes Remou, that was me [blush] before I realized what the data looked like .... I am still waiting to get the mega excel sheet ...

S.C.
 
What I normally do for something like this is first create a key field. This field is a concatenation of fields to dedupe based on. I usually use the first 5 characters of first name, first 8 of last name, first 10 of address, you get the picture. It all depends how tight or loose you want your matching to be.

I'd also create an ID field (autonumber).

Then to get the 'clean' list data you'd try a query like this:

Code:
SELECT [col1], [col2]...
from [TABLENAME]
where [ID] in 
(
SELECT min([ID]) from [TABLENAME] group by [KEY]
)

Hope this gets you on the right track,

Alex

Professor: But what about your superintelligence?
Gunther: When I had that there was too much pressure to use it. All I want out of life is to be a monkey of moderate intelligence who wears a suit. That's why I've decided to transfer to Business School.
Professor: NOOOOOOOOOOOO.
 
Thanks, Alex. I think I'm going to try a totals query. If I group by the field that has duplicates I think it will work. Thank you for the code, that will help. :)
 
You may run into problems with a totals query, because you need to group by everything that you select. If you want to do an individual mailing that is fine, but if you only want to mail 1 per household that could get tricky. I'll let you focus on the address standardization first.



Professor: But what about your superintelligence?
Gunther: When I had that there was too much pressure to use it. All I want out of life is to be a monkey of moderate intelligence who wears a suit. That's why I've decided to transfer to Business School.
Professor: NOOOOOOOOOOOO.
 
Hi AC,

They have split the addresses, separating the number from the street names and are now fixing the street names to be consistent. They are doing this in ACCESS and are planning to give the list to me when it is done.

Can you explain me a little more about your code suggestion? I'm not sure how that works. If you have time ... thanks!! [ponder]
 
All you will need to do is add an identity field to your table (Access Data Type = "AutoNumber"). For the sake of my query this will be called "ID".

If you don't want to modify the SQL that I posted above, and are more comfortable with query designer then you can do this using two queries.

For the example, table name will be [AddTable].
Your query to join to will be [IDUniqueQuery].

[IDUniqueQuery] will be a totals query. Select ID and everything you want to group on. Then click your totals button. You will leave all fields as 'Group By' except for "ID" field, where you will change the drop down to "Min". Then deselect the 'show' box for every field except "ID". This query will give you the lowest ID number for each grouping.

You then set up another query, selecting everything from [AddTable] and joining to [IDUniqueQuery] on the [ID] field, using an inner join (this will be done for you automatically).

I hope I explained it adequately, I really don't use the query designer that much though.

Hope it helps,

Alex

Professor: But what about your superintelligence?
Gunther: When I had that there was too much pressure to use it. All I want out of life is to be a monkey of moderate intelligence who wears a suit. That's why I've decided to transfer to Business School.
Professor: NOOOOOOOOOOOO.
 
Hi Alex,

Thanks again for the help. Will be working with this code. We had some crisis here which kind of upended priorities for a bit .... thanks again!! S.C.
 
Let me know how you get on with it. These types of projects are often the most frustrating.

Good Luck (I can't say it enough ;-) )

Alex

Professor: But what about your superintelligence?
Gunther: When I had that there was too much pressure to use it. All I want out of life is to be a monkey of moderate intelligence who wears a suit. That's why I've decided to transfer to Business School.
Professor: NOOOOOOOOOOOO.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top