Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Improve performance of delete

Status
Not open for further replies.

sparkbyte

Technical User
Sep 20, 2002
879
US
I have a delete query that strips duplicates from a table with about 6M rows.

Code:
Delete t1
from dbo.tblTrackingTable t1, dbo.tblTrackingTable t2
Where 
	t1.BoxNumber = t2.BoxNumber And
	t1.FileNumber = t2.FileNumber And
	t1.TrackingDate = t2.TrackingDate And
	t1.Tracking_ID > t2.Tracking_ID

As you can see this is very simplistic. The problen is that when I ran it the SQL server jumped to 95% utiliztion and stayed there for over 3 minutes.

Here is the table structure.
Code:
CREATE TABLE [dbo].[tblTrackingTable](
	[Tracking_ID] [int] IDENTITY(1,1) NOT NULL,
	[EmployeeID] [varchar](50) NULL,
	[MachineName] [varchar](20) NULL,
	[BoxNumber] [varchar](45) NOT NULL,
	[FileNumber] [varchar](25) NOT NULL,
	[TrackingDate] [datetime] NULL,
	[Reship] [bit] NULL,
	[BoxNumberOriginal] [varchar](50) NULL,
 CONSTRAINT [PK_tblTrackingTable] PRIMARY KEY CLUSTERED

What happened was that I imported about 3 years archive data from Access DB's and one of the files had about 500,000 rows of overlap.(duplicates) For whatever reason the SSIS import did not srtip these out. It was supposed to and had been doing an excellent job previously but for some reason did not on this particular file.

I still have about 4 other files to import but don't want to disrupt the mailroom just to do a quick cleanup process on the table.

Thanks!!!!!


Thanks

John Fuhrman
 
Firstly, you should use the new join syntax. i.e.
Code:
Delete t1
from dbo.tblTrackingTable t1 inner join 
   dbo.tblTrackingTable t2 on t1.BoxNumber = t2.BoxNumber And
    t1.FileNumber = t2.FileNumber And
    t1.TrackingDate = t2.TrackingDate And
Where
    t1.Tracking_ID > t2.Tracking_ID
[code]

I suspect the problem is "t1.Tracking_ID > t2.Tracking_ID"

instead of doing the delete this way, it may be better to fist extract the duplicates, and then delete using the id, so something like:

with dup as
(
   select max(id) id
   from tbl1
   group by boxNum, fileNum, TrackDate
   having count(id) > 1
)
delete from table where id in (select id from dup)

this is not tested though...

--------------------
Procrastinate Now!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top