Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing Duplicate Records During Query 1

Status
Not open for further replies.

Corr

ISP
Jan 7, 2003
48
US
Hello,

How do I remove duplicate records from a table? I am comparing two tables by customer SSN and the results return several duplicate records. Can some one post a reply that can remove the duplicates prior to dumping my results? Here is a copy of my Query Below:

SELECT *, flhomes.debtor_ssn AS Expr1, SCACCT.GSS# AS Expr2
FROM SCACCT INNER JOIN
flhomes ON SCACCT.GSS# = flhomes.debtor_ssn

Sometimes one person whos SSN matches returns like 20 records. My boss just wants one result per person that matches. Can anyone help?

Thanks In Advance,

John
Boca Raton, FL

John
Systems Manager
Boca Raton, FL
 
I am guessing that one of these tables has SSN as the primary key and the duplicates are being caused by linking to the other table which has multiple instances of each SSN?

Which table has SSN as the PK?
What actual columns do you want to return from each table?
If an SSN has multiple instances which one do you want to show?

--James
 
The SCACCT Table has the Pri Key for SSN. I want all the columns however I want it to retain only one record per duplicate SSN or after getting my results dumped into a seperate table...have it filterout the DUPS then. Either way is fine just not sure what would be easier. I'm getting over 400000 results when one of the tables has only 10000 records in it. SCACCT is the Primary Table with lots of duplicate records matching the SSN column which is probably why im getting so many duplicate results. FLHOMES is a new table with one record per SSN (Or so I think). I need to see what records in SCACCT match those in FLHOMES and remove the duplicates. SSN is my unique identifier between both tables.

John
Systems Manager
Boca Raton, FL
 
You seem to have contradicted yourself here:

Corr said:
The SCACCT Table has the Pri Key for SSN

Corr said:
SCACCT is the Primary Table with lots of duplicate records matching the SSN column

If SSN is the primary key in the SCACCT table then you can't have "lots of duplicates".

Also, the key question remains: for the duplicate rows per each SSN, which one do you want to keep? eg, you might have some king of date column and you want to keep the earliest/latest entry per SSN.

--James
 
My preferred way to remove dups quickly without temp tables, etc, is to add an identity field or some (preferably numeric) random unique number, call it 'recid' for this example:
(This assumes you don't care which ones are kept--if you have criteria for that, it gets more complex)

DELETE a.recid
FROM mytable a
WHERE a.recid
NOT IN (
SELECT max(recid)
FROM mytable b
WHERE b.firstkeyfield = a.firstkeyfield
AND b.secondkeyfield = a.secondkeyfield
AND ...other 'key' fields...
)

'firstkeyfield' would be SSN in your case, and you can remove all sql from 'AND secondkeyfield...' on.
--Jim
 
Yes James I just realized that. I'm not functioning well.

Here is the quick run down. I have two tables SCACCT and FLHOMES, both have at least three fields I'm using(FNAME, LNAME, and SSN)I want those three fields to be displayed however duplicate records with SSN as the duplicate removed. With this perticular query I am not concerned about the other fields that contain any other data. I just want a list of unique people in one database that are in another.

John
Systems Manager
Boca Raton, FL
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top