Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove duplicates 1

Status
Not open for further replies.

dbalpine

Programmer
Sep 17, 2003
1
US
This should be simple for most of you...I want to remove duplicate e-mail addresses in my email field of my table called email_pft. so...

DELETE FROM 'email_pft' WHERE email= ????

that if an address exists more than once delete all other instances but the first one??
 
It's not as simple as that.

To do the deletion in a single query would require the use of a subquery. MySQL only supports subqueries in versions 4.1.x and newer. However, those versions of MySQL are alpha-revision code.

You can do it in two stages pretty simply with older versions of MySQL if you have a numerical ID column for each record. First, you get a list of IDs of rows for duplicated entries by joining the table to itself. Something like:

Code:
select t2.id_column from
		email_pft as t1
	left join
		email_pft as t2
	on t1.email = t2.email
WHERE
	t2.email is not null and
	t1.id_column < t2.id_column

If you have a programming languate (PHP, perl, etc) you could then loop through that return, deleting rows by IDs.

If you don't have a programming language, you should be able to copy-and-paste that return into a text editor and transform it into a list of IDs to be deleted in a query like:

Code:
DELETE FROM email_pft WHERE id_column in (<list of IDs>)

For that matter, you could also use a programming language to perform the transformation on the return.



If you are running MySQL 4.1.x, you can probably issue a single query something like:

Code:
delete from email_pft where id_column in
(
	select t2.id_column from
			email_pft as t1
		left join
			email_pft as t2
		on t1.email = t2.email
	WHERE
		t2.email is not null and
		t1.id_column < t2.id_column
)

I say &quot;probably&quot; because I don't have a version of MySQL 4.1.x to test the query on.


If you don't have an id column, it'll be pretty tough finding duplicates.


Want the best answers? Ask the best questions: TANSTAAFL!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top