help with dupe query

pandatime · Jul 8, 2010

Hi,

I have a query I'm using to eliminate duplicate rows from a table with about 5 mil rows. PK is a clustered index.

The problem is, it's extremely slow. When I look at the execution plan, it's very complex, and I see most of the cost is on "sort".

Is there a way to do this that is more efficient??

Thanks

Code:

	DELETE
	FROM myTable
	AND pk NOT IN
	(
	SELECT MAX(pk)
	FROM myTable
	GROUP BY
	col1,
	col2,
	col3,
	col4,
	col5,
	col6,
	col7,
	col8,
	col9
	)

markros · Jul 8, 2010

SQL Server 2005 and up

Code:

;with cte as (select PK, row_Number() over (partition by col1, col2, col3, col4, col5, col6, col7, col8, col9 ORDER by PK Desc) as Row from myTable)

delete from cte where Row > 1

PluralSight Learning Library

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

help with dupe query

pandatime

Programmer

markros

Programmer

Similar threads

Part and Inventory Search

Sponsor