Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

selecting top 3 location zip code counts 3

Status
Not open for further replies.

SQLDog

Programmer
Jun 5, 2007
6
US
I need to generate a report with the top 3 zip codes by Location. My tables are Location, Customer, and Address.

I have a Select statement like this:

Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = location.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
order by l.name, count desc

this returns all the zip codes per location (and a count of each) but I would like to just return the top 3 zips per location.

I am looking for the results to look like this:

Boulder 80304 258
Boulder 80301 190
Boulder 80302 172
Bridgewater 08805 205
Bridgewater 07920 99
Bridgewater 08876 98
Broomfield 80020 796
Broomfield 80027 207
Broomfield 80021 144

...

Any ideas?
 
Maybe someone has a better idea (And I'd love to learn it), but in the past when I've had to do this it was very awkward. I created a union where I took the max count of venues in a zip code then the next part I took the max of zipcodes excluding those I already had and then the third union I took the max count of venues excluding both previous ones. I also did this with a temp table and three inserts where I didn't insert into the table if the record already existed (using a left join to the #temp table where the id field was null). Either way was a pain.

Questions about posting. See faq183-874
 
You can join the query to itself, and key off of a count to limit whats' returned. Use an inner join on location name and count <= count. You can then limit your query by only selecting records that have 1, 2, or 3 rows with a count value <= 3 by using a having clause.

Something like this:

Code:
Select a.* from
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = location.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
order by l.name, count desc
) a
inner join 
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = location.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
order by l.name, count desc
) b
on a.name = b.name
and a.count <= b.count
group by a.name, a.zip, a.count
having count(b.*) <= 3

This is not tested, but I think its' pretty close. I'm not in an environment where I can test atm, but I will check back tomorrow.

Hope it helps,

Alex

Ignorance of certain subjects is a great part of wisdom
 
some clown said:
You can then limit your query by only selecting records that have 1, 2, or 3 rows with a count value <= 3 by using a having clause.

Of course, by count value <= 3, I mean the count of rows on the right side of the join.

Also, I used a <= when I should've used a >= for the join (we want only the rows with 1, 2 or 3 corresponding higher/equal counts, the join in the first query would give you the lowest 3 counts)

So...

Code:
Select a.* from
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = location.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
order by l.name, count desc
) a
inner join 
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = location.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
order by l.name, count desc
) b
on a.name = b.name
and a.count >= b.count
group by a.name, a.zip, a.count
having count(b.*) <= 3

Next time I'll wait till I can test ;-)

Hope it helps,

Alex

Ignorance of certain subjects is a great part of wisdom
 
one more idiot thing I did, I left the order by in your subqueries...


Last one of the day

Code:
Select a.* from
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = location.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
) a
inner join 
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = location.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
) b
on a.name = b.name
and a.count >= b.count
group by a.name, a.zip, a.count
having count(b.*) <= 3
order by a.name, a.count desc

Now, if none of these work, I will be very embarassed [blush]


Ignorance of certain subjects is a great part of wisdom
 
Thanks Alex! I will give it a try when I get to work tomorrow.

Mike
 
Well, I am very embarassed. I messed up one other thing, it doesn't like

Code:
having count(b.*) <= 3

so something like this could work:

Code:
having count(b.name) <= 3

So here's the whole query (I also changed that join condition back to <=, I was confused there and didn't have any data to test on):

Code:
[COLOR=blue]Select[/color] a.* [COLOR=blue]from[/color]
(
[COLOR=blue]Select[/color] l.name, a.zip, [COLOR=#FF00FF]count[/color](a.zip) [COLOR=blue]as[/color] [COLOR=#FF00FF]count[/color]
[COLOR=blue]From[/color] location l
[COLOR=blue]inner[/color] [COLOR=blue]join[/color] customer c [COLOR=blue]on[/color] c.locationid = location.id
[COLOR=blue]inner[/color] [COLOR=blue]join[/color] address a [COLOR=blue]on[/color] a.customerid = c.id
[COLOR=blue]group[/color] [COLOR=blue]by[/color] l.name, a.zip
) a
[COLOR=blue]inner[/color] [COLOR=blue]join[/color] 
(
[COLOR=blue]Select[/color] l.name, a.zip, [COLOR=#FF00FF]count[/color](a.zip) [COLOR=blue]as[/color] [COLOR=#FF00FF]count[/color]
[COLOR=blue]From[/color] location l
[COLOR=blue]inner[/color] [COLOR=blue]join[/color] customer c [COLOR=blue]on[/color] c.locationid = location.id
[COLOR=blue]inner[/color] [COLOR=blue]join[/color] address a [COLOR=blue]on[/color] a.customerid = c.id
[COLOR=blue]group[/color] [COLOR=blue]by[/color] l.name, a.zip
) b
[COLOR=blue]on[/color] a.name = b.name
and a.count <= b.count
[COLOR=blue]group[/color] [COLOR=blue]by[/color] a.name, a.zip, a.count
[COLOR=blue]having[/color] [COLOR=#FF00FF]count[/color](b.name) <= 3
[COLOR=blue]order[/color] [COLOR=blue]by[/color] a.name, a.count [COLOR=#FF00FF]desc[/color]

Hope it helps,

Alex

Ignorance of certain subjects is a great part of wisdom
 
Alex,
The latest worked like a charm. Thanks for the help and a fine example of how to write a query of this type.

Mike
 
Hey, glad it worked. Sorry it took 8 billion tries though ;-)

Thanks for the purple thingy :-D

Ignorance of certain subjects is a great part of wisdom
 
Just be aware that for a really large table these kinds of self-joins can become very costly. Those derived tables take up tempdb space.

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Would you try the following? I'd be very interested to learn the performance differences, not just in reads/writes but also CPU time. (Run them with trace turned on as well as execution plan and give us the highlights!)
Code:
SELECT
   L.name,
   A.zip,
   Count = Count(A.zip)
INTO #Zips
FROM
   location L
   inner join customer C on C.locationid = L.id
   inner join address A on A.customerid = C.id
GROUP BY
   L.name,
   A.zip

-- try Alex's method but replace the derived tables with the temp table. Also try a table variable instead!
-- last, try the correlated subquery method. If the number of unused zip codes is very large, the correlated subquery could outperform the derived table. It is very hard to know for any particular query in advance which will do better.
-- last if you want to be interesting, try the following but replace the temp tables with derived tables again...

SELECT *
FROM #Zips Z
WHERE Zip IN (SELECT TOP 3 Zip FROM #Zips WHERE Name = Z.Name ORDER BY Count DESC)

DROP TABLE #Zips

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Ooh... here's another one to try:

Code:
SELECT TOP 100 PERCENT
   sequence = identity(int, 1, 1),
   L.name,
   A.zip,
   Count = Count(A.zip)
INTO #Zips
FROM
   location L
   inner join customer C on C.locationid = L.id
   inner join address A on A.customerid = C.id
GROUP BY
   L.name,
   A.zip
ORDER BY
   Name,
   Count DESC

SELECT Z.Name, Z.Zip, Z.Count
FROM
   #Zips Z
   INNER JOIN (
      SELECT Name, Limit = Min(Sequence) + 2 FROM #Zips GROUP BY NAME
   ) X ON Z.Name = X.Name AND Z.Sequence <= Limit

DROP TABLE #Zips
The only thing about this is that something has come to mind that maybe ordering SELECT INTOs is not always reliable. Consarndit. Have to look into that.

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Thanks for providing the other methods to look at, E2.

With the common table expression in SQL 2005, some of these will become even easier :)

Ignorance of certain subjects is a great part of wisdom
 
And how about the rowindex thingie? that would help a lot.

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
I did some testing and got some interesting results.

The correlated subquery sucks (I generally hate those things but they can perform well when the number of rows is small for the correlation and large for the derived table method).

I tried Alex's method with a temp table and a table variable (substituting them in place of the derived tables).
The table variable without a primary key blew major chunks. I never even let the query finish because it was taking so long (four times + other methods). The table variable with a primary key performed better and close to the original method in CPU but (of course) had 30 times the reads and some writes where his had none.

Surprisingly, the third method with Min(Sequence) + 2 performed the best of all, coming in at about half the cost.

Here are my results:
[tt] CPU Reads Writes Duration
Alex1: 5265 26659 0 8313
Erik1: 12219 315924 510 14720
Erik2: 2031 15060 187 3843[/tt]


If anyone would like to play around with some sample data, here is some code to do that. Note that it might take up to a couple of hours for it to create all the data:

Code:
CREATE DATABASE TopPerGroup
GO
USE TopPerGroup
GO
CREATE VIEW Random -- Circumvent SQL Server's rules about functions being deterministic
AS
SELECT Rand = Rand()
GO
CREATE FUNCTION RandomName(@Len int)
RETURNS varchar(40)
AS
BEGIN
   DECLARE
		@Name varchar(40)
   SELECT @Len = Floor(Rand * (@Len - 4) + 3) FROM Random
	SET @Name = ''
   WHILE @Len > 0 BEGIN
		SELECT @Name = @Name + Char(Rand * 26 + 97) FROM Random
      SET @Len = @Len - 1
   END
	SELECT @Name = Char(Rand * 26 + 65) + @Name FROM Random
   RETURN @Name
END
GO
CREATE FUNCTION RandomFloat()
RETURNS float
AS
BEGIN
	RETURN (SELECT Rand FROM Random)
END
GO
CREATE FUNCTION RandomNumber(@Limit int)
RETURNS int
AS
BEGIN
	RETURN (SELECT Floor(Rand * @Limit) + 1 FROM Random)
END
GO
CREATE TABLE Location (
	ID int identity(1,1) NOT NULL CONSTRAINT PK_Location PRIMARY KEY CLUSTERED,
   Name varchar(40) not null
)

CREATE TABLE Customer (
   ID int identity(1,1) NOT NULL CONSTRAINT PK_Customer PRIMARY KEY CLUSTERED,
   LocationID int NOT NULL CONSTRAINT FK_Customer_LocationID FOREIGN KEY REFERENCES Location(ID)
)

CREATE TABLE Address (
   ID int identity(1,1) NOT NULL CONSTRAINT PK_Address PRIMARY KEY CLUSTERED,
	CustomerID int NOT NULL CONSTRAINT FK_Address_CustomerID FOREIGN KEY REFERENCES Customer(ID),
	Zip char(5) NOT NULL CONSTRAINT CK_ZipFive CHECK (Zip LIKE '[0-9][0-9][0-9][0-9][0-9]')
)
GO
SET NOCOUNT ON
INSERT Location SELECT dbo.RandomName(20)
WHILE Scope_Identity() < 200 INSERT Location SELECT dbo.RandomName(20)
GO
INSERT Customer
SELECT ID
FROM
	(
		SELECT TOP 100 ID, Rnd = dbo.RandomFloat()
		FROM Location
		ORDER BY Rnd
	) X

WHILE Scope_Identity() < 50000 BEGIN
	INSERT Customer
	SELECT ID
	FROM
		(
			SELECT TOP 100 ID, Rnd = dbo.RandomFloat()
			FROM Location
			ORDER BY Rnd
		) X
END

INSERT Address
SELECT ID, Right('0000' + Convert(varchar(5), dbo.RandomNumber(200) + 79999), 5)
FROM
	(
		SELECT TOP 100 ID, Rnd = dbo.RandomFloat()
		FROM Customer
		ORDER BY Rnd
	) X
WHILE Scope_Identity() < 400000 BEGIN
	INSERT Address
	SELECT ID, Right('0000' + Convert(varchar(5), dbo.RandomNumber(200) + 89999), 5)
	FROM
		(
			SELECT TOP 100 ID, Rnd = dbo.RandomFloat()
			FROM Customer
			ORDER BY Rnd
		) X
END
GO
alter table address add Address1 varchar(128)
alter table address add Address2 varchar(128)
alter table address add City varchar(64)
alter table address add State char(2)
alter table address add AddressTypeID int
GO
update address set address1 = '1234 Somewhere Pl'
update address set address2 = 'Suite 200'
update address set city = dbo.RandomName(20)
update address set state = 'CA'
update address set addresstypeid = dbo.RandomNumber(4)
GO


--verify original query
Select l.name, a.zip, count(*) as count
From location l
inner join customer c on c.locationid = l.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
order by name, count desc


--run these with SQL Profiler. The results in Query Analyzer with Trace on aren't giving the right answers.
--batch completed is all you need.
GO
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
GO
Select a.* from
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = l.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
) a
inner join 
(
Select l.name, a.zip, count(a.zip) as count
From location l
inner join customer c on c.locationid = l.id
inner join address a on a.customerid = c.id
group by l.name, a.zip
) b
on a.name = b.name
and a.count <= b.count
group by a.name, a.zip, a.count
having count(b.name) <= 3
order by a.name, a.count desc
GO
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
GO
SELECT
   L.name,
   A.zip,
   Count = Count(A.zip)
INTO #Zips
FROM
   location L
   inner join customer C on C.locationid = L.id
   inner join address A on A.customerid = C.id
GROUP BY
   L.name,
   A.zip

SELECT *
FROM #Zips Z
WHERE Zip IN (SELECT TOP 3 Zip FROM #Zips WHERE Name = Z.Name ORDER BY Count DESC)
ORDER BY
	Name, Count DESC

DROP TABLE #Zips
GO
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
GO
SELECT TOP 100 PERCENT
   sequence = identity(int, 1, 1),
   L.name,
   A.zip,
   Count = Count(A.zip)
INTO #Zips
FROM
   location L
   inner join customer C on C.locationid = L.id
   inner join address A on A.customerid = C.id
GROUP BY
   L.name,
   A.zip
ORDER BY
   L.Name,
   Count DESC

SELECT Z.Name, Z.Zip, Z.Count
FROM
   #Zips Z
   INNER JOIN (
      SELECT Name, Limit = Min(Sequence) + 2 FROM #Zips GROUP BY NAME
   ) X ON Z.Name = X.Name AND Z.Sequence <= Limit
ORDER BY
	Z.Name, Z.Count DESC

DROP TABLE #Zips
GO
USE master
GO
DROP DATABASE TopPerGroup
I haven't run the whole thing in one go, as I had to make some modifications so the number of zip codes per location came out okay. If there's a problem let me know please.






[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top