We have a table into which data is loaded from an external source. The problem is that it contains duplicate, triplicate, etc primary key values, of which we want to delete invalid records and leave just one valid record, i.e with a unique primary key (id). We cannot do this while the data is being loaded because we may end up removing valid records.
Now this is the trick - we determine the records we want to retain by looking at four other fields (disdate, nhsnum, nhsstat and diag), i.e. where there are records with the same id and:
1) disdate is null or disdate is not null, keep record where disdate is not null.
2) nhsnum is null or nhsnum is not null, keep record where nhsnum is not null.
3) nhsstat = 02 or nhsstat = 03 or nhsstat = 06, keep record where nhsstat = 06.
4) diag codes, keep record where the length of diag is the greatest.
How do we write a query to delete duplicate, triplicate, etc records and leave just one valid record?
NB: We are using MS SQL 2000.
Thank you in advance for your help.
JCAD1
Now this is the trick - we determine the records we want to retain by looking at four other fields (disdate, nhsnum, nhsstat and diag), i.e. where there are records with the same id and:
1) disdate is null or disdate is not null, keep record where disdate is not null.
2) nhsnum is null or nhsnum is not null, keep record where nhsnum is not null.
3) nhsstat = 02 or nhsstat = 03 or nhsstat = 06, keep record where nhsstat = 06.
4) diag codes, keep record where the length of diag is the greatest.
How do we write a query to delete duplicate, triplicate, etc records and leave just one valid record?
NB: We are using MS SQL 2000.
Thank you in advance for your help.
JCAD1