Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Merge duplicate records based on certain criteria 1

Status
Not open for further replies.

RobHat

Technical User
Nov 30, 2007
91
GB
Hi, not posted in a while. Could be seen as a good thing I suppose. :)

I am hoping someone can help me with an SQL query I am trying to write to be able to merge duplicate records. Firstly I would just like to add that the duplicates are not of my doing and that I have only recently took control of the database.

I have a Table called person which holds all the demographics of clients. However I have multiple clients that have 2,3 or in some cases 4 duplicates. All have different info and very few have an ID unique to them. I have managed to pull a list of people with duplicate records based on Forename, Surname and DOB. An example of a duplicated record is:

ClientID, Forename, Surname, DOB, Address1, Address2, Postcode, etc
1, John, Smith, 01/01/1990, 1 Test Street, Test Area, AB24CD
2, John, Smith, 01/01/1990, 2 Test Street, Test Area, EF56GH
3, John, Smith, 01/01/1990, 1 Test Street, , AB24CD

So the ID is different as it auto increments on insert, Names are the same as are DOB, but address's could be incomplete, duplicate or different.
I also have an appointments table which has the appointment info (ClientID, Date, etc) The ClientID is the Key between the 2 tables. In my head i think the query needs to work in the following:

Find Duplicates, Look for the record with most appointments associated with it and use it as the primary record, Merge all other data into the primary record (where address is the same fine, where address is missing fill the gap, where address is different check which has the most recent appointment and use that one) and then update the ClientID field in the appointment table, from any appointments with the old ID change to the primary ID.

Does that make sense. Sorry for the long winded post but I cant explain it any simpler really. I know in my head what I need it to do but have no idea about writing a query this complex. Any help or pointers would be appreciated.

Thanks Rob.
 
Thanks George,
So am I right in thinking that the following should be the overall plan:

Run the previous query you gave me into a new table: DupeIdentify

Write a new query which does the following:

Selects the data from DupeIdentify, loops through the results and for each row returned:

uses the above Coalesce and NullIf to merge the necessary data??

In my head I am then left with a table of cleaned records? All I have to do following this is figure out how to merge that data back into the original person table and update all contract and activity info to have the correct person id assigned? Should I be able to do this at the same time as the merge?

(Where there are more than 2 records for each person will I need to do multiple passes at this?)

Sorry to keep going on with this but I am stuck with finding an answer. I just wish my SQL knowledge was advanced enough to get this sorted. I am trying :)




 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top