Removing Duplicates

jez · Feb 7, 2007

Hi everyone,

I am trying to remove all the duplicates in a large table.
I found the post on here regarding finding duplicates which gave the following SQL query

Code:

SELECT Domain_Name, Count(Someotherfield) AS CountOfEntry
FROM Domain
GROUP BY Domain_name
HAVING (((Count(Someotherfield))>1));

And for my database it is

Code:

SELECT item_id, item_name, COUNT(item_name) as NumNames
FROM items
GROUP BY item_name
HAVING (COUNT(item_name) > 1);

This tells which are duplicates but it only give me one row per duplicate, so if there are two items called "fish" then i only see one of the IDs.

My aim is to make this into a DELETE query so that i can clean all the duplicates in one hit.

Can someone kind give me some advice on how to modify my query to do this.

Thanks,

JEz

r937 · Feb 7, 2007

if you have two "duplicate" rows, which one do you want to keep? what if there are three of them?

could you do a SHOW CREATE TABLE please

r937.com | rudy.ca

jez · Feb 7, 2007

Ok i have had a thought about this problem..

The results that are coming back have an "item_id" in the fieldlist and then the count column called "NumNames" shows a 2 or more (depending on how many times the name has been found).

So.. does this mean that the item_id that i am seeing is the item_id of the duplicate item?

If so the DELETE query would be ....

Code:

DELETE FROM `items` WHERE
  (SELECT item_id, item_name, COUNT(item_name) as NumNames
    FROM items
GROUP BY item_name
HAVING (COUNT(item_name) > 1));

Is this close to correct? I am asking because it throws an error when i run this,

Code:

Operand should contain 1 column(s)

Jez

jez · Feb 7, 2007

Hi r937,

I missed your post while i was replying, do i need to do the SHOW CREATE TABLE on the items table?

Jez

r937 · Feb 7, 2007

yes, please SHOW CREATE TABLE

also, if you have two "duplicate" rows, which one do you want to keep? what if there are three of them?

note that your SELECT query was showing only one of the ids of the "duplicate" rows -- and you have no way to know which one

r937.com | rudy.ca

jez · Feb 8, 2007

Code:

'items', 'CREATE TABLE `items` (
  `item_id` int(10) unsigned NOT NULL auto_increment,
  `item_city_id` varchar(10) NOT NULL,
  `item_name` varchar(200) NOT NULL,
  `item_bookstandfirst` varchar(150) default NULL,
  `item_url` varchar(300) NOT NULL,
  `item_price_cat_id` int(10) default NULL,
  `item_AZcaption` varchar(200) NOT NULL,
  `item_AreaCaption` varchar(200) default NULL,
  `item_TypeCaption` varchar(200) default NULL,
  `item_chapter_id` int(10) NOT NULL,
  `item_cat2_id` int(10) default NULL,
  `item_area2_id` int(10) default NULL,
  `item_goodValue` enum('y','n') default 'n',
  `item_mapRef` varchar(45) default NULL,
  `Modified` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='venues, activities, places''

That is the table create statement.
Also in answer to your question, i am trying to find all the item_id s for any item where the name is the same as another item.
Then i guess i want to remove the items with those IDs to remove all but the first item with that name.

Jez

r937 · Feb 8, 2007

remove all but the first of duplicates --

Code:

delete items
  from items
left outer
  join (
       select min(item_id) as keep_id  
         from items                    
       group by item_name               
       ) as keepitems
    on keepitems.keep_id = items.item_id   
 where keepitems.keep_id is null

r937.com | rudy.ca

jez · Feb 8, 2007

Thats excellent thankyou, works like a charm!

Kozusnik · Feb 15, 2007

Also you may want to consider changing item_name to unique.

Mark

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Removing Duplicates

jez

Programmer

r937

Technical User

jez

Programmer

jez

Programmer

r937

Technical User

jez

Programmer

r937

Technical User

jez

Programmer

Kozusnik

Programmer

Similar threads

Part and Inventory Search

Sponsor