sql server 2000, locating double apearances of a record 2

pssheba · Jan 4, 2007

Hi !
My "Clients" table contains among others the following fields: "Name", "Address", "ID", "Phone". The Primary key is set to none of these.
I'd like each client to show only once in that table but this doesnt happen. Some clients' names appear twice or more an so the phone, id etc..
i want to "clean" that table wherby locating the "doubles" and erase unwanted record.
My question is: How do i locate those ambigiuos records?
"Name" for example. I could "Order by name" but i cannot make a survey of 10000 records reading it manually. i want SQL to tell me if it finds two identical names or two identical phones, "ID" etc..
Your aid is a relief for me...

Catadmin · Jan 4, 2007

Mind you, the below code does NOT account for misspellings and assumes ID is the same for both records (if not, remove it from the sub-query), but it should help you to some degree.

Code:

Select PK, Name, ID, Address, Phone
from Clients c
join (Select Name, ID, Address, Phone
      from Clients
      Group By ID, Name, Address, Phone
      Having Count(ID) > 2) t1
on c.ID = t1.ID
and c.Name = t1.Name
and c.Address = t1.Address
and c.Phone = t1.Phone

If you remove ID from the sub-query, make sure to replace it with NAME in the Having statement.

Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"

ptheriault · Jan 4, 2007

use group by name having count(*) > 1 in your query to locate duplicate records.

- Paul
- If at first you don't succeed, find out if the loser gets anything.

Catadmin · Jan 4, 2007

Paul,

He's going to have to include all the fields in the Group By, not just Name. Unless he nixes all the other fields. And since John Smith is likely to be 50 different people than 50 records of the same person, I think he still needs to search on at least Phone or Address.

Pssheba, I don't like using the Count(*) personally because I keep running into problems where one of the fields has a unique value. Hence my use of an actual field name in the Count statement. But this is just personal preference and entirely up to you.

Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"

TysonLPrice · Jan 4, 2007

Expanding ptheriault's reply. Something like:

select description,plant,department,team, count(*) from tyson_rl_HondaLoad
group by description,plant,department,team
having count(*) > 1

Pickig the columns you want.

SQLSister · Jan 4, 2007

Alternative method is to join the table to iself on all the other fields which you need to indicate a unique person (the natural key vice the key field of the table)and then put in a where clause where the id fields do not match.

Going further to the future, you need to come up with a way to prevent the additon of duplicates once you have them scrubbed. Consider a unique index on the natural key field. Of course you can't do this until you have cleaned up the data.

All this assumes that your data is in one table, if address and or phone are in separate tables from your main name table, then the dups may be showing up becasue you have multiple records for the same person in the subordinate table.

Questions about posting. See faq183-874

Catadmin · Jan 4, 2007

Hey, now there's a thought! Defining the data by defining what it's not. That might actually work better, especially as people can spell "John" and "Jon" different but they could be the same person.

Good one, SQLSister! @=)

Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"

duchovnick · Jan 4, 2007

hi catadmin
I tried the code you wrote but it gave an error message such as:

ambigous column name id

and for the rest of the column names as well.
Any idea how to deal with that ?
Thanks a lot !

SQLSister · Jan 4, 2007

when you have the same column in multiple tables you must alawys specify which table you mean. So anywhere you have the fieldname ID (a poor choice for a fieldname BTW - change it if you can to one more descriptive) then preface it withthe table name or alias

Questions about posting. See faq183-874

Catadmin · Jan 5, 2007

SQLSister is right and I coded poorly when I wrote my code. You should add a c. before every field in the initial Select statement. My bad. Sorry for the confusion.

Code:

Select c.PK, c.Name, c.ID, c.Address, c.Phone
from Clients c
join (Select Name, ID, Address, Phone
      from Clients
      Group By ID, Name, Address, Phone
      Having Count(ID) > 2) t1
on c.ID = t1.ID
and c.Name = t1.Name
and c.Address = t1.Address
and c.Phone = t1.Phone

Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"

duchovnick · Jan 5, 2007

Thanks a lot !

Catadmin · Jan 5, 2007

NP. Glad I could help.

Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

sql server 2000, locating double apearances of a record 2

pssheba

Programmer

Catadmin

Programmer

ptheriault

IS-IT--Management

Catadmin

Programmer

TysonLPrice

Programmer

SQLSister

Programmer

Catadmin

Programmer

duchovnick

Programmer

SQLSister

Programmer

Catadmin

Programmer

duchovnick

Programmer

Catadmin

Programmer

Similar threads

Part and Inventory Search

Sponsor