Ferdalizer
Programmer
I have read many of the posts here regarding SQL select statements for de-duplication of tables and I wanted to get everyone's opinion of the method that I use.
The table TMP, has the fields:
name_first, Name_last, Address1, address2, city, state, zipcode
All addresses in the table are standardized to postal specifications.
I create a KEY field of c(80) and I fill it with:
REPLACE ALL KEY WITH UPPER(STRTRAN(ALLTR(NAME_FIRST)+(UPPER(NAME_LAST)+ALLTR(ADDRESS1)+ALLTR(ADDRESS2)+ALLTR(CITY)+ALLTR(STATE)+SUBSTR(ZIPCODE,1,5)),' ','')
Then:
INDEX ON KEY TAG KEY
then I use:
SELECT DISTINCT *, COUNT(*) FROM TMP GROUP BY KEY
This returns a cursor of the de-duped records with a duplicate count appended.
I have been told that using the "GROUP BY KEY" could be returning too many records though all my checks of the de-duped tables turn out fine.
what do you think?
Thanks in advance.
Fred
The table TMP, has the fields:
name_first, Name_last, Address1, address2, city, state, zipcode
All addresses in the table are standardized to postal specifications.
I create a KEY field of c(80) and I fill it with:
REPLACE ALL KEY WITH UPPER(STRTRAN(ALLTR(NAME_FIRST)+(UPPER(NAME_LAST)+ALLTR(ADDRESS1)+ALLTR(ADDRESS2)+ALLTR(CITY)+ALLTR(STATE)+SUBSTR(ZIPCODE,1,5)),' ','')
Then:
INDEX ON KEY TAG KEY
then I use:
SELECT DISTINCT *, COUNT(*) FROM TMP GROUP BY KEY
This returns a cursor of the de-duped records with a duplicate count appended.
I have been told that using the "GROUP BY KEY" could be returning too many records though all my checks of the de-duped tables turn out fine.
what do you think?
Thanks in advance.
Fred