delete duplicate rows from large table

MikeM2468 · Dec 6, 2012

I have two tables table1 and table2. Each is over 1 million rows. Table2 has some records that are also in table1. I need to delete the rows from table2 that are also in table1. I've tried this:

Code:

delete * from table2 where record in (select * from table1 where record like "1%");

Both tables only have one column called record. If I run the above command it just chugs for hours and never finishes.

feherke · Dec 6, 2012

Hi

Avoid [tt]in[/tt] with sub-[tt]select[/tt]. Take advantage of MySQL's [tt]delete[/tt] syntax which allows [tt]join[/tt] with other tables :

Code:

[b]delete[/b] [b]from[/b] table2

[b]using[/b] table1
[b]inner[/b] [b]join[/b] table2

[b]where[/b] table1[teal].[/teal]record[teal]=[/teal]table2[teal].[/teal]record
[b]and[/b] table1[teal].[/teal]record [b]like[/b] [green][i]'1%'[/i][/green][teal];[/teal]

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

MikeM2468 · Dec 7, 2012

I get the same results with this query. Just runs forever and never does anything.

feherke · Dec 7, 2012

Hi

There is another way, in two steps. Should also be faster, if the previous one not helped, probably neither this one will. This supposes that table2.record has no primary or unique key set :

Code:

[b]update[/b] table2[teal],[/teal]table1

[b]set[/b] table2[teal].[/teal]record[teal]=[/teal][b]null[/b]

[b]where[/b] table1[teal].[/teal]record[teal]=[/teal]table2[teal].[/teal]record
[b]and[/b] table1[teal].[/teal]record [b]like[/b] [green][i]"1%"[/i][/green][teal];[/teal]

[b]delete[/b] [b]from[/b] table2

[b]where[/b] record [b]is[/b] [b]null[/b][teal];[/teal]

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

MikeM2468 · Dec 7, 2012

I think it's doing the same thing. What can I do with my data to make it work? Any tuning I can do the the server? Indexing? I'm running this in dev on Windows 7 64-bit 8GB RAM, MySQL 5.5.12. The two tables have just one column containing a list of words with a maximum length of 20 characters. Table2 has a bunch of entries that should already be in table1. I need to delete the duplicates from table2. I'm running the queries from the mysql CLI.

feherke · Dec 7, 2012

Hi

MySQL knowledge is mostly theoretical as I use it rarely. So this may be plain stupid idea, but first I would try to check the database for damages. ( I think [tt]mysqlcheck[/tt] serves that purpose. )

Feherke.
[link feherke.github.com/]

http://feherke.github.com/

[/url]

EdSharp · Dec 14, 2012

Try this:

Code:

delete from table2, table1 where exists 
  (select 1 from table1 where table2.record = table1.record);

Regards,

Ed

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

delete duplicate rows from large table

MikeM2468

IS-IT--Management

feherke

Programmer

MikeM2468

IS-IT--Management

feherke

Programmer

MikeM2468

IS-IT--Management

feherke

Programmer

EdSharp

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor