Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Excel 2010 dups

Status
Not open for further replies.

OCM

MIS
Sep 12, 2002
220
US
Greetings,

I receive daily student registration reports in Excel 2010 format. Every now and then I’ll notice the same student is entered multiple times because of the following reasons:

Name: Joe L. Madison vs. Joseph Lance Madison
Address: Main Street vs. Main Str.
Phone #: home # vs. mobile #
E-mail address: JLM@msn.com vs. JLM@aol.com

Following are my column names: Name, Address, CSZ, ID, Birthdate, SSN, Phone, Term, Program, Email

What would be the best way to catch this dups? I was thinking, may be a macro to check combination of birth date + SSN + ID or any other field and if there is a match then flag as “…match found…”

Any ideas?

By the way, can you attach a word/excel document in this forum?

Regards,

OCM
 
hi,

This is a typical nasty dirty problem.

To clean up is a very manual process AND necessary!

You need to analyse various parts of your data, caregorize the data by cleanup similarities and then design a cleanup process for each category.

VERY time consuming!

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
There may be a cleaner solution, but I'd do the following.

Assuming SSN is column F.
1. Sort by SSN.
2. Add a "DUPE" column (K).
3. In the top data row of the DUPE column, enter
Code:
=if(F2=F1,"DUPE","")
4. Copy this code to every row.
5. Copy/paste values the entire DUPE column
6. Sort on DUPE column.
7. Delete your duplicte rows.


Randy
 
Thank you both for your replies,

Randy, can you please explain the logic behind the formula? It worked partially, except that if student #1 and student #2 have blank or n/a in their SSN field, it returns as DUPE (even if they are two diffrent students.

Can the formula be modified to check multiple columns (ID + SSN + Birth date)?

Regards,


OCM
 
just build a concatenated key in a column for lookup

Skip,

[glasses]Just traded in my old subtlety...
for a NUANCE![tongue]
 
Can you please give some exampl?

Thanks,

OCM
 

except that if student #1 and student #2 have blank or n/a in their SSN field, it returns as DUPE (even if they are two diffrent students

Getting this result indicates incomplete data.
If sorted properly, these rows will all be together.

You may have to modify the formula to exclude certain rows based on missing SSN values.
Perhaps something like:

Code:
=if(or(F2="",F2="N/A"),"",if(F2=F1,"DUPE",""))

Randy
 
Thanks Randy for the helpful reply!

Regards,


OCM
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top