Count occurances of identical data in a field 3

DietSquirt · Jan 28, 2008

(T-SQL 2000)

I'm trying to find duplicate customer records that have been set up in our db. If I concatenate Address+City+State+Zip, Is there a way to count the number of occurrances of that string? The customer numbers will be different if there are multiple identical records.
Thanks for any help that may be offered!

gmmastros · Jan 28, 2008

Does this work for you?

Code:

Select   Address + city + State + zip, 
         Count(*) As DuplicateCount
From     Address
Group By Address1+city+State+zip
Having   Count(*) > 1

If it does, and you want me to explain it, just let me know.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

SQLDenis · Jan 28, 2008

Code:

select Address+City+State+Zip,count(*)
from tablename
group by Address+City+State+Zip
having count(*) > 1
order by 2 desc

you might need to convert to varchar if some of the columns are integers

Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions

LNBruno · Jan 28, 2008

If you delete my wife's account just because she lives at the same address I do, she is NOT going to be a happy customer!

;-)

< M!ke >
[small]Don't believe everything you think.[/small]

r937 · Jan 28, 2008

you might need to SELECT and GROUP BY the columns individually rather than concatenated, because Address+City+NULL+Zip is not the same as Address+NULL+State+Zip, even though they concatenate to the same value

r937.com | rudy.ca

SQLSister · Jan 28, 2008

And don't forget if you have genuine duplicate records (Two differnt people with the same smae can also live (or work) at the same address, usually a father and son))
that you must merge the records of any child tables before removing the record of the primary table. Otherwise you lose the data associated with that person who was duplicated.

"NOTHING is more important in a database than integrity." ESquared

SQLDenis · Jan 28, 2008

>>you might need to SELECT and GROUP BY the columns individually rather than concatenated, because Address+City+NULL+Zip is not the same as Address+NULL+State+Zip, even though they concatenate to the same value

If there are NULL values then you can run this before the query

SET CONCAT_NULL_YIELDS_NULL OFF

make sure you run this after the query

SET CONCAT_NULL_YIELDS_NULL ON

Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions

DietSquirt · Jan 30, 2008

Thanks to all who gave me tips, I'm making progress. This (Combined with the set statements):

Select Address + city + State + zip,
Count(*) As DuplicateCount
From Address
Group By Address1+city+State+zip
Having Count(*) > 1

...nearly works, except that gives me all the duplicates in the table without a way to identify them.
I realized one thing: We're trying to clean up drop ship customers of one Distribitor, and he has his own customer record in the same table with a suffix of 0. All his customers have a suffix <> 0,
so I need to add criteria for that
I have to add something like this:
Where custno = '1234' and custsuffix <> 0
No matter what I try either get error messages or no data.
PS I all ready got the list by exporting it to excel and using subtotals, but I'd like to learn how to do this in SQL if you don't mind coaching me.
Thanks!

PS I won't delete your wife's account! [wink]

LNBruno · Jan 31, 2008

If you delete my wife's account just because she lives at the same address I do, [red]SHE[/red] is not going to be a happy customer...

...but [red]I[/red] would be ever grateful!

< M!ke >
[small]Don't believe everything you think.[/small]

SQLSister · Feb 4, 2008

Is this the way you did the where clause?

Code:

Select   Address + city + State + zip, 
         Count(*) As DuplicateCount
From     Address
Where custno = '1234' and custsuffix <> 0
Group By Address1+city+State+zip
Having   Count(*) > 1

"NOTHING is more important in a database than integrity." ESquared

DietSquirt · Feb 7, 2008

Sorry SQLSister, for the delay in replying, I just got back to this. The suggested code returns no records. If I remove the Having count(*) >1 I can see why. Each record returned has count = 1.

PatriciaObreja · Feb 8, 2008

This will give you a way to actually see the duplicates:

Select * From Address Where
Address + City + State + Zip IN
(Select Address + City + State + Zip From Address
GROUP BY Address + City + State + Zip
HAVING COUNT(Address + City + State + Zip) > 1)

-------------------------------------------------------------------------------------------------------------------------
"Now I can look at you in peace; I don't eat you any more." Franz Kafka, while admiring fish in an aquarium

DietSquirt · Feb 14, 2008

Thanks for the suggestion Patricia, but that gets me 3700 records and I know (having done it all ready in Excel) that there are 134 records. I need to be able to limit the return set to one customer, so I added the customer number & sequence. There are about 5500 total records for this customer with 134 duplicates, but the query returns 3700 records. Some of them are duplicates and some are not.
[bummer]

Select a.addr##3+a.city+a.state+a.zip AS Compare, * From dbo.custaddr a
Where
a.addr##3+a.city+a.state+a.zip IN
(Select a.addr##3+a.city+a.state+a.zip From dbo.custaddr a
GROUP BY a.addr##3+a.city+a.state+a.zip

HAVING COUNT(a.addr##3+a.city+a.state+a.zip) > 1)
and a.cust_num = 'B0001' and a.cust_seq <> 0
order by Compare

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Count occurances of identical data in a field 3

DietSquirt

MIS

gmmastros

Programmer

SQLDenis

Programmer

LNBruno

Programmer

r937

Technical User

SQLSister

Programmer

SQLDenis

Programmer

DietSquirt

MIS

LNBruno

Programmer

SQLSister

Programmer

DietSquirt

MIS

PatriciaObreja

Programmer

DietSquirt

MIS

Similar threads

Part and Inventory Search

Sponsor