Hi,
Anyone know of an efficient way of writing a stored procedure that removes certain types of duplicate records?
The source table I'm looking at contains records that are tagged with a status field containing one of 4 values. In some circumstances, a transaction exists in more than one status resulting in (e.g.) 2 rows in the table, both identical aside from the content of the status field.
Both records share the same key value.
What I want to do is exclude rows, but only where the status of that row is x, and the status of another row is either A,B, or C (not x) where both rows share the same key value:
Key status
12345 X - exclude this one
12345 Y - include
23456 X - include
I've tried sub queries (where not in (select...) pointing at both the base data (500k records), or temp tables (2k records)which contain result sets (all with indexed fields) but so far it either blows the transaction log, or takes too long.
Any suggestions?
Anyone know of an efficient way of writing a stored procedure that removes certain types of duplicate records?
The source table I'm looking at contains records that are tagged with a status field containing one of 4 values. In some circumstances, a transaction exists in more than one status resulting in (e.g.) 2 rows in the table, both identical aside from the content of the status field.
Both records share the same key value.
What I want to do is exclude rows, but only where the status of that row is x, and the status of another row is either A,B, or C (not x) where both rows share the same key value:
Key status
12345 X - exclude this one
12345 Y - include
23456 X - include
I've tried sub queries (where not in (select...) pointing at both the base data (500k records), or temp tables (2k records)which contain result sets (all with indexed fields) but so far it either blows the transaction log, or takes too long.
Any suggestions?