Hi, not posted in a while. Could be seen as a good thing I suppose.
I am hoping someone can help me with an SQL query I am trying to write to be able to merge duplicate records. Firstly I would just like to add that the duplicates are not of my doing and that I have only recently took control of the database.
I have a Table called person which holds all the demographics of clients. However I have multiple clients that have 2,3 or in some cases 4 duplicates. All have different info and very few have an ID unique to them. I have managed to pull a list of people with duplicate records based on Forename, Surname and DOB. An example of a duplicated record is:
ClientID, Forename, Surname, DOB, Address1, Address2, Postcode, etc
1, John, Smith, 01/01/1990, 1 Test Street, Test Area, AB24CD
2, John, Smith, 01/01/1990, 2 Test Street, Test Area, EF56GH
3, John, Smith, 01/01/1990, 1 Test Street, , AB24CD
So the ID is different as it auto increments on insert, Names are the same as are DOB, but address's could be incomplete, duplicate or different.
I also have an appointments table which has the appointment info (ClientID, Date, etc) The ClientID is the Key between the 2 tables. In my head i think the query needs to work in the following:
Find Duplicates, Look for the record with most appointments associated with it and use it as the primary record, Merge all other data into the primary record (where address is the same fine, where address is missing fill the gap, where address is different check which has the most recent appointment and use that one) and then update the ClientID field in the appointment table, from any appointments with the old ID change to the primary ID.
Does that make sense. Sorry for the long winded post but I cant explain it any simpler really. I know in my head what I need it to do but have no idea about writing a query this complex. Any help or pointers would be appreciated.
Thanks Rob.
I am hoping someone can help me with an SQL query I am trying to write to be able to merge duplicate records. Firstly I would just like to add that the duplicates are not of my doing and that I have only recently took control of the database.
I have a Table called person which holds all the demographics of clients. However I have multiple clients that have 2,3 or in some cases 4 duplicates. All have different info and very few have an ID unique to them. I have managed to pull a list of people with duplicate records based on Forename, Surname and DOB. An example of a duplicated record is:
ClientID, Forename, Surname, DOB, Address1, Address2, Postcode, etc
1, John, Smith, 01/01/1990, 1 Test Street, Test Area, AB24CD
2, John, Smith, 01/01/1990, 2 Test Street, Test Area, EF56GH
3, John, Smith, 01/01/1990, 1 Test Street, , AB24CD
So the ID is different as it auto increments on insert, Names are the same as are DOB, but address's could be incomplete, duplicate or different.
I also have an appointments table which has the appointment info (ClientID, Date, etc) The ClientID is the Key between the 2 tables. In my head i think the query needs to work in the following:
Find Duplicates, Look for the record with most appointments associated with it and use it as the primary record, Merge all other data into the primary record (where address is the same fine, where address is missing fill the gap, where address is different check which has the most recent appointment and use that one) and then update the ClientID field in the appointment table, from any appointments with the old ID change to the primary ID.
Does that make sense. Sorry for the long winded post but I cant explain it any simpler really. I know in my head what I need it to do but have no idea about writing a query this complex. Any help or pointers would be appreciated.
Thanks Rob.