Hi all,
Been quite a few years since I've had to deal with duplicates in MySQL and have a large database that I need to remove duplicate records from, so looking for the least resource intensive/easiest way to do it.
I've got the following setup:
id, postcode, street_address, postal_town
Now, I'm looking to remove any duplicates where all 3 of the above (other than id) are duplicated in more than one row, so for example:
In the above, rows 1 and 2 and 5 are duplicates and I want to remove all but one of those.
Things I need to take into account:
1) Street names can exist in different towns so I cannot just use something as simple as pulling out unique street names or unique postcodes, it needs to check if all 3 fields are identical before removing anything.
2) This table has just under 30 million records.
3) Any number of records could be duplicated and there could be any number of duplicates on each.
Thanks in advance,
Wullie
The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
Story Choices - Your Story, Your Way
Been quite a few years since I've had to deal with duplicates in MySQL and have a large database that I need to remove duplicate records from, so looking for the least resource intensive/easiest way to do it.
I've got the following setup:
id, postcode, street_address, postal_town
Now, I'm looking to remove any duplicates where all 3 of the above (other than id) are duplicated in more than one row, so for example:
1,EH4 2BS,example terrace,edinburgh
2,EH4 2BS,example terrace,edinburgh
3,EH52 3SD,example street,broxburn
4,EH52 3SW,example road,broxburn
5,EH4 2BS,example terrace,edinburgh
In the above, rows 1 and 2 and 5 are duplicates and I want to remove all but one of those.
Things I need to take into account:
1) Street names can exist in different towns so I cannot just use something as simple as pulling out unique street names or unique postcodes, it needs to check if all 3 fields are identical before removing anything.
2) This table has just under 30 million records.
3) Any number of records could be duplicated and there could be any number of duplicates on each.
Thanks in advance,
Wullie
The pessimist complains about the wind. The optimist expects it to change. The leader adjusts the sails. - John Maxwell
Story Choices - Your Story, Your Way