Removing duplicate records

sparked · Aug 27, 2004

Hi,

Anyone know of an efficient way of writing a stored procedure that removes certain types of duplicate records?

The source table I'm looking at contains records that are tagged with a status field containing one of 4 values. In some circumstances, a transaction exists in more than one status resulting in (e.g.) 2 rows in the table, both identical aside from the content of the status field.
Both records share the same key value.

What I want to do is exclude rows, but only where the status of that row is x, and the status of another row is either A,B, or C (not x) where both rows share the same key value:

Key status
12345 X - exclude this one
12345 Y - include
23456 X - include

I've tried sub queries (where not in (select...) pointing at both the base data (500k records), or temp tables (2k records)which contain result sets (all with indexed fields) but so far it either blows the transaction log, or takes too long.

Any suggestions?

mukund · Sep 9, 2004

Hi Sparked,
Try the query given below

/* Query */
Select key, status from tablename
Group By key, status
/* Query */

Hope this helps.
-Mukund.

sparked · Sep 11, 2004

Thanks mukund,

Wouldn't that just remove absolute duplicates though ?

What I was after was an efficient method of removing records where the key value is the same but the status is different. (See above example)

Someone else suggested doing a select max first on the status field and then grouping, as long as the results of the max give the correct status field.

Using a cursor seems to be the most effective method though as I can insert a commit point at the end of each loop, breaking down the whole transaction into smaller chunks.

killerkoolkris · Nov 21, 2004

Hi sparked,

Your problem is very interesting. I think you will have to use a combination of union and temp tables.
Here's what you can try.
1) first get all the non duplicate records using a group by key, count(*) having count(*) = 1 into a temp table.
2) get all the duplicate records into another temp table using the same concept. group by key, count(*) having count(*) = 2.
3) Now using 2 unions with the keys being selected from the temp tables you can get the required resultset.

Hope this makes sense ....

Cheers ;-),
KK

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Removing duplicate records

sparked

MIS

mukund

Programmer

sparked

MIS

killerkoolkris

Programmer

Similar threads

Part and Inventory Search

Sponsor