Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PHP 4 / MySql cleaning 100K records for dupes

Status
Not open for further replies.

Kadanis

Programmer
Mar 21, 2002
108
0
0
GB
Hi

I need to clean a MySql database table of duplicated emails.

I've written a php script that pulls in a SELECT query of the table containing 1 of each of the dupes then parses over the array to remove all items that match the Email but not the ID of this query

The problem is that it takes too long. Our dedicated server is set to timeout after 120 seconds (standard is 30) so I need this to take place as fast as possible.

Realistically the table will only contain a few duplicates but potentially the list could be all dupes so it needs to be able to handle all 100'000 rows.

If anyone can help optimise this process, your help would be greatly appreicated.

Thanks in advance

Code:
$dupeList= mysql_query("SELECT ID, Emails, Count(Emails ) FROM Contacts GROUP BY Emails HAVING (Count(Emails ) > 1)",$con);
	
while ($row = mysql_fetch_array($dupeList)){
mysql_db_query("dblists","DELETE FROM Contacts WHERE Emails ='".$row['Emails']."' AND ID <> '".$row['ID']."'",$con);
}

Am I jumping the gun, Baldrick, or are the words 'I have a cunning plan' marching with ill-deserved confidence in the direction of this conversation?
 
Do you have an auto_increment column in your table?

If so, you can do a query of the table against itself:

SELECT
*
FROM
<tablename> t1 JOIN <tablename> t2 ON t1.email = t2.email
WHERE
t1.id != t2.Id


Notice you're looking for records where the emails match but the IDs don't. That at least will give you a list of only the duplicated emails.



Want the best answers? Ask the best questions! TANSTAAFL!
 
Thanks for the help.

I've managed to do it a somewhat round-about route,but it completes in about 2 seconds so does the job

Obviously the below is only the SQL, this was put in the usual PHP / MYSQL commands.

Code:
//gets all the unique records by Email address into a temp table

CREATE TEMPORARY TABLE no_dupes SELECT * FROM Contacts GROUP BY  Email HAVING Count(Email) >= 1

//delete the table
DROP TABLE Contacts

//create new table with clean data
CREATE TABLE Contacts SELECT * FROM no_dupes

Am I jumping the gun, Baldrick, or are the words 'I have a cunning plan' marching with ill-deserved confidence in the direction of this conversation?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top