PHP 4 / MySql cleaning 100K records for dupes

Kadanis · Feb 16, 2006

Hi

I need to clean a MySql database table of duplicated emails.

I've written a php script that pulls in a SELECT query of the table containing 1 of each of the dupes then parses over the array to remove all items that match the Email but not the ID of this query

The problem is that it takes too long. Our dedicated server is set to timeout after 120 seconds (standard is 30) so I need this to take place as fast as possible.

Realistically the table will only contain a few duplicates but potentially the list could be all dupes so it needs to be able to handle all 100'000 rows.

If anyone can help optimise this process, your help would be greatly appreicated.

Thanks in advance

Code:

$dupeList= mysql_query("SELECT ID, Emails, Count(Emails ) FROM Contacts GROUP BY Emails HAVING (Count(Emails ) > 1)",$con);
	
while ($row = mysql_fetch_array($dupeList)){
mysql_db_query("dblists","DELETE FROM Contacts WHERE Emails ='".$row['Emails']."' AND ID <> '".$row['ID']."'",$con);
}

Am I jumping the gun, Baldrick, or are the words 'I have a cunning plan' marching with ill-deserved confidence in the direction of this conversation?

sleipnir214 · Feb 16, 2006

Do you have an auto_increment column in your table?

If so, you can do a query of the table against itself:

SELECT
*
FROM
<tablename> t1 JOIN <tablename> t2 ON t1.email = t2.email
WHERE
t1.id != t2.Id

Notice you're looking for records where the emails match but the IDs don't. That at least will give you a list of only the duplicated emails.

Want the best answers? Ask the best questions! TANSTAAFL!

Kadanis · Feb 16, 2006

Thanks for the help.

I've managed to do it a somewhat round-about route,but it completes in about 2 seconds so does the job

Obviously the below is only the SQL, this was put in the usual PHP / MYSQL commands.

Code:

//gets all the unique records by Email address into a temp table

CREATE TEMPORARY TABLE no_dupes SELECT * FROM Contacts GROUP BY  Email HAVING Count(Email) >= 1

//delete the table
DROP TABLE Contacts

//create new table with clean data
CREATE TABLE Contacts SELECT * FROM no_dupes

Am I jumping the gun, Baldrick, or are the words 'I have a cunning plan' marching with ill-deserved confidence in the direction of this conversation?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

PHP 4 / MySql cleaning 100K records for dupes

Kadanis

Programmer

sleipnir214

Programmer

Kadanis

Programmer

Similar threads

Part and Inventory Search

Sponsor