Deleting duplicate records from large table

scabral79 · Mar 12, 2008

Hi,

i have a very large table (over 200 million rows) that i need to delete duplicate records from. Here a sample of the table:

ID CustNum QuestionID Response SystemID
--- ------ ---------- -------- --------
1 100 5 Yes 87
2 100 5 Yes 87
3 100 6 Yes 89
4 200 3 No 87
5 200 3 No 87

From the above sample, i would need to delete record ID's 2 and 5, since these are duplicate records of ID's 1 and 4. Since this is a very large table, anything i try to do with it takes forever and eats up transaction log space.

Does anyone have an idea on the most efficient way to find and remove these duplicate records?

Thanks
scott

Crowley16 · Mar 12, 2008

partition your deletes, e.g. delete all duplicates where id between 1 and 1000000, then 1000001 and 2000000...

at the end of each delete, you can clear down the log, or set your log level to simple or bulk...

--------------------
Procrastinate Now!

SQLSister · Mar 12, 2008

Here's an FAQ to help

http://www.tek-tips.com/faqs.cfm?fid=3141

It is most important that you set a unique index on those fields once you are done cleaning the data, so that the problem will not re-occur.

"NOTHING is more important in a database than integrity." ESquared

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Deleting duplicate records from large table

scabral79

MIS

Crowley16

Technical User

SQLSister

Programmer

Similar threads

Part and Inventory Search

Sponsor