Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing duplicate records

Status
Not open for further replies.

sparked

MIS
May 27, 2003
24
GB
Hi,

Anyone know of an efficient way of writing a stored procedure that removes certain types of duplicate records?

The source table I'm looking at contains records that are tagged with a status field containing one of 4 values. In some circumstances, a transaction exists in more than one status resulting in (e.g.) 2 rows in the table, both identical aside from the content of the status field.
Both records share the same key value.

What I want to do is exclude rows, but only where the status of that row is x, and the status of another row is either A,B, or C (not x) where both rows share the same key value:

Key status
12345 X - exclude this one
12345 Y - include
23456 X - include

I've tried sub queries (where not in (select...) pointing at both the base data (500k records), or temp tables (2k records)which contain result sets (all with indexed fields) but so far it either blows the transaction log, or takes too long.


Any suggestions?








 
Hi Sparked,
Try the query given below

/* Query */
Select key, status from tablename
Group By key, status
/* Query */


Hope this helps.
-Mukund.
 
Thanks mukund,

Wouldn't that just remove absolute duplicates though ?

What I was after was an efficient method of removing records where the key value is the same but the status is different. (See above example)

Someone else suggested doing a select max first on the status field and then grouping, as long as the results of the max give the correct status field.

Using a cursor seems to be the most effective method though as I can insert a commit point at the end of each loop, breaking down the whole transaction into smaller chunks.




 
Hi sparked,

Your problem is very interesting. I think you will have to use a combination of union and temp tables.
Here's what you can try.
1) first get all the non duplicate records using a group by key, count(*) having count(*) = 1 into a temp table.
2) get all the duplicate records into another temp table using the same concept. group by key, count(*) having count(*) = 2.
3) Now using 2 unions with the keys being selected from the temp tables you can get the required resultset.

Hope this makes sense ....

Cheers ;-),
KK
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top