Below is the code I am currently using to dedupe my data. While this works fine, I am not able to protect my client data and update the query_id on the data I want (purchased data) because SQL has to use the uniqueid field. My data comes in at different times. Purchased lists come in days before the client data, but since they have been imported first, they have the min(uniqueid) instead of the client data.
I have a field called 'filecode' which separates client from purchased data.
Is there any way other than creating a new uniqueid field based on an index that sorts by filecode to achieve the goal I need? My only reason for not wanting another field is for space reasons. Most files are over 200,000+ records
Update [ClientData_temp]
set Query_ID = 'DUPES'
from [ClientData_temp]a
where uniqueid not in(select MIN(uniqueid) from [ClientData_temp] B WHERE UPPER(B.LASTNAME) = UPPER(a.LASTNAME)
AND UPPER(B.ADDRESS1) = UPPER(a.ADDRESS1)
AND B.ZIP = a.ZIP
AND B.NTRNLKYCD = A.NTRNLKYCD
AND QUERY_ID IS NULL)
Thanks in advance,
Elena
I have a field called 'filecode' which separates client from purchased data.
Is there any way other than creating a new uniqueid field based on an index that sorts by filecode to achieve the goal I need? My only reason for not wanting another field is for space reasons. Most files are over 200,000+ records
Update [ClientData_temp]
set Query_ID = 'DUPES'
from [ClientData_temp]a
where uniqueid not in(select MIN(uniqueid) from [ClientData_temp] B WHERE UPPER(B.LASTNAME) = UPPER(a.LASTNAME)
AND UPPER(B.ADDRESS1) = UPPER(a.ADDRESS1)
AND B.ZIP = a.ZIP
AND B.NTRNLKYCD = A.NTRNLKYCD
AND QUERY_ID IS NULL)
Thanks in advance,
Elena