Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Count occurances of identical data in a field 3

Status
Not open for further replies.
Dec 27, 2007
56
US
(T-SQL 2000)

I'm trying to find duplicate customer records that have been set up in our db. If I concatenate Address+City+State+Zip, Is there a way to count the number of occurrances of that string? The customer numbers will be different if there are multiple identical records.
Thanks for any help that may be offered!

 
Does this work for you?

Code:
Select   Address + city + State + zip, 
         Count(*) As DuplicateCount
From     Address
Group By Address1+city+State+zip
Having   Count(*) > 1

If it does, and you want me to explain it, just let me know.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
If you delete my wife's account just because she lives at the same address I do, she is NOT going to be a happy customer!

;-)

< M!ke >
[small]Don't believe everything you think.[/small]
 
you might need to SELECT and GROUP BY the columns individually rather than concatenated, because Address+City+NULL+Zip is not the same as Address+NULL+State+Zip, even though they concatenate to the same value

r937.com | rudy.ca
 
And don't forget if you have genuine duplicate records (Two differnt people with the same smae can also live (or work) at the same address, usually a father and son))
that you must merge the records of any child tables before removing the record of the primary table. Otherwise you lose the data associated with that person who was duplicated.

"NOTHING is more important in a database than integrity." ESquared
 
>>you might need to SELECT and GROUP BY the columns individually rather than concatenated, because Address+City+NULL+Zip is not the same as Address+NULL+State+Zip, even though they concatenate to the same value


If there are NULL values then you can run this before the query

SET CONCAT_NULL_YIELDS_NULL OFF

make sure you run this after the query

SET CONCAT_NULL_YIELDS_NULL ON

Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions
 
Thanks to all who gave me tips, I'm making progress. This (Combined with the set statements):

Select Address + city + State + zip,
Count(*) As DuplicateCount
From Address
Group By Address1+city+State+zip
Having Count(*) > 1

...nearly works, except that gives me all the duplicates in the table without a way to identify them.
I realized one thing: We're trying to clean up drop ship customers of one Distribitor, and he has his own customer record in the same table with a suffix of 0. All his customers have a suffix <> 0,
so I need to add criteria for that
I have to add something like this:
Where custno = '1234' and custsuffix <> 0
No matter what I try either get error messages or no data.
PS I all ready got the list by exporting it to excel and using subtotals, but I'd like to learn how to do this in SQL if you don't mind coaching me.
Thanks!

PS I won't delete your wife's account![wink]
 
If you delete my wife's account just because she lives at the same address I do, [red]SHE[/red] is not going to be a happy customer...

...but [red]I[/red] would be ever grateful! :p

< M!ke >
[small]Don't believe everything you think.[/small]
 
Is this the way you did the where clause?
Code:
Select   Address + city + State + zip, 
         Count(*) As DuplicateCount
From     Address
Where custno = '1234' and custsuffix <> 0
Group By Address1+city+State+zip
Having   Count(*) > 1

"NOTHING is more important in a database than integrity." ESquared
 
Sorry SQLSister, for the delay in replying, I just got back to this. The suggested code returns no records. If I remove the Having count(*) >1 I can see why. Each record returned has count = 1.
 
This will give you a way to actually see the duplicates:

Select * From Address Where
Address + City + State + Zip IN
(Select Address + City + State + Zip From Address
GROUP BY Address + City + State + Zip
HAVING COUNT(Address + City + State + Zip) > 1)

-------------------------------------------------------------------------------------------------------------------------
"Now I can look at you in peace; I don't eat you any more." Franz Kafka, while admiring fish in an aquarium
 
Thanks for the suggestion Patricia, but that gets me 3700 records and I know (having done it all ready in Excel) that there are 134 records. I need to be able to limit the return set to one customer, so I added the customer number & sequence. There are about 5500 total records for this customer with 134 duplicates, but the query returns 3700 records. Some of them are duplicates and some are not.
[bummer]


Select a.addr##3+a.city+a.state+a.zip AS Compare, * From dbo.custaddr a
Where
a.addr##3+a.city+a.state+a.zip IN
(Select a.addr##3+a.city+a.state+a.zip From dbo.custaddr a
GROUP BY a.addr##3+a.city+a.state+a.zip

HAVING COUNT(a.addr##3+a.city+a.state+a.zip) > 1)
and a.cust_num = 'B0001' and a.cust_seq <> 0
order by Compare
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top