Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing Duplicates

Status
Not open for further replies.

jez

Programmer
Apr 24, 2001
370
VN
Hi everyone,

I am trying to remove all the duplicates in a large table.
I found the post on here regarding finding duplicates which gave the following SQL query
Code:
SELECT Domain_Name, Count(Someotherfield) AS CountOfEntry
FROM Domain
GROUP BY Domain_name
HAVING (((Count(Someotherfield))>1));

And for my database it is
Code:
SELECT item_id, item_name, COUNT(item_name) as NumNames
FROM items
GROUP BY item_name
HAVING (COUNT(item_name) > 1);

This tells which are duplicates but it only give me one row per duplicate, so if there are two items called "fish" then i only see one of the IDs.

My aim is to make this into a DELETE query so that i can clean all the duplicates in one hit.

Can someone kind give me some advice on how to modify my query to do this.

Thanks,

JEz
 
if you have two "duplicate" rows, which one do you want to keep? what if there are three of them?

could you do a SHOW CREATE TABLE please

r937.com | rudy.ca
 
Ok i have had a thought about this problem..

The results that are coming back have an "item_id" in the fieldlist and then the count column called "NumNames" shows a 2 or more (depending on how many times the name has been found).

So.. does this mean that the item_id that i am seeing is the item_id of the duplicate item?

If so the DELETE query would be ....

Code:
DELETE FROM `items` WHERE
  (SELECT item_id, item_name, COUNT(item_name) as NumNames
    FROM items
GROUP BY item_name
HAVING (COUNT(item_name) > 1));

Is this close to correct? I am asking because it throws an error when i run this,

Code:
Operand should contain 1 column(s)

Jez
 
Hi r937,

I missed your post while i was replying, do i need to do the SHOW CREATE TABLE on the items table?

Jez
 
yes, please SHOW CREATE TABLE

also, if you have two "duplicate" rows, which one do you want to keep? what if there are three of them?

note that your SELECT query was showing only one of the ids of the "duplicate" rows -- and you have no way to know which one

r937.com | rudy.ca
 
Code:
'items', 'CREATE TABLE `items` (
  `item_id` int(10) unsigned NOT NULL auto_increment,
  `item_city_id` varchar(10) NOT NULL,
  `item_name` varchar(200) NOT NULL,
  `item_bookstandfirst` varchar(150) default NULL,
  `item_url` varchar(300) NOT NULL,
  `item_price_cat_id` int(10) default NULL,
  `item_AZcaption` varchar(200) NOT NULL,
  `item_AreaCaption` varchar(200) default NULL,
  `item_TypeCaption` varchar(200) default NULL,
  `item_chapter_id` int(10) NOT NULL,
  `item_cat2_id` int(10) default NULL,
  `item_area2_id` int(10) default NULL,
  `item_goodValue` enum('y','n') default 'n',
  `item_mapRef` varchar(45) default NULL,
  `Modified` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  PRIMARY KEY  (`item_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='venues, activities, places''


That is the table create statement.
Also in answer to your question, i am trying to find all the item_id s for any item where the name is the same as another item.
Then i guess i want to remove the items with those IDs to remove all but the first item with that name.

Jez
 
remove all but the first of duplicates --
Code:
delete items
  from items
left outer
  join (
       select min(item_id) as keep_id  
         from items                    
       group by item_name               
       ) as keepitems
    on keepitems.keep_id = items.item_id   
 where keepitems.keep_id is null

r937.com | rudy.ca
 
Thats excellent thankyou, works like a charm!
 
Also you may want to consider changing item_name to unique.

Mark
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top