Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Finding TOP values without resorting to cursors! 3

Status
Not open for further replies.

colinb

Programmer
Mar 26, 2001
10
0
0
GB
This may seem a very simple question, but I am trying to find a way to list say the top 5 values (or less if fewer exist) for each person in a table witout having to resort to cursors.

Here is a simplified version of the problem:

Code:
CREATE TABLE tbl1
(
id int IDENTITY NOT NULL PRIMARY KEY,
PersonCode varchar(6) NOT NULL,
DataValue int
)
go

INSERT tbl1 VALUES ('ABC123, 18')
INSERT tbl1 VALUES ('ABC123, 7')
INSERT tbl1 VALUES ('ABC123, 14')
INSERT tbl1 VALUES ('ABC123, 92')
INSERT tbl1 VALUES ('ABC123, 12')
INSERT tbl1 VALUES ('ABC123, 15')
INSERT tbl1 VALUES ('ABC123, 3')
INSERT tbl1 VALUES ('ABC123, 6')
INSERT tbl1 VALUES ('ABC123, 68')
INSERT tbl1 VALUES ('DEF456, 5')
INSERT tbl1 VALUES ('DEF456, 6')
INSERT tbl1 VALUES ('DEF456, 78')
INSERT tbl1 VALUES ('GHI789, 63')
INSERT tbl1 VALUES ('GHI789, 12')
INSERT tbl1 VALUES ('GHI789, 148')
INSERT tbl1 VALUES ('GHI789, 15')
INSERT tbl1 VALUES ('GHI789, 11')
INSERT tbl1 VALUES ('GHI789, 44')
INSERT tbl1 VALUES ('GHI789, 32')
go

Now, the results I want are:
Code:
PersonCode DataValue
---------- ---------
ABC123  92
ABC123  68
ABC123  18
ABC123  15
ABC123  14
DEF456  78
DEF456  6
DEF456  5
GHI789  148
GHI789  63
GHI789  44
GHI789  32
GHI789  15
Two of us have been struggling over this for around 3 hrs and we have the added pressure of a very tight deadline.

Hopefully someone can help before Mr Cursor comes a-callin'!
 
A combined 6 hours later...

This method creates a table variable with an identity column. If you insert the data in to the table variable in the proper order, this query becomes relatively trivial. Like this...

Code:
Declare @Temp Table(Rowid Int Identity(1,1), PersonCode VarChar(6), DataValue Int)

Insert Into @Temp(PersonCode, DataValue)
Select PersonCode, DataValue
From   tbl1
Order By PersonCode, DataValue DESC

Select T.* 
From   @Temp T
       Inner Join (
         Select   Min(RowId) As MinRowId, PersonCode
         From     @Temp
         Group By PersonCode
         ) As A On T.PersonCode = A.PersonCode
                And T.RowId Between A.MinRowId And A.MinRowId + 4

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
I'm not 100% sure how this would impact performance since I don't know how large your actual table is; use a correlated sub-query:
Code:
SELECT a.personcode, a.datavalue
  FROM tbl1 a
 WHERE a.datavalue IN
	(SELECT TOP 5 b.datavalue
           FROM tbl1 b
          WHERE b.personcode = a.personcode
          ORDER BY b.datavalue DESC)
 ORDER BY a.personcode, a.datavalue
 
In SQL Server 2005 use the rowid functions to get what you want! Unfortunately, I don't have that installed right now.

Here's my SQL Server 2000 version to achieve almost the same task:

Code:
SELECT a.personcode, a.datavalue
FROM
   tbl1 A
   INNER JOIN tbl1 B ON A.personcode = B.personcode AND A.datavalue <= B.DataValue
GROUP BY a.personcode, a.datavalue
HAVING
   Count(*) <= 5
Mine has the flaw (or benefit, depending on what you want) of returning more than five rows if there are ties at the value that comes at row five. So if the values are [11 9 7 5 3 3 3 3 1] you'll get 8 rows to include the extra three 3s.

With the small dataset given, here are the tree costs from the execution plan on my computer for the three suggestions so far:

gmmastros: 0.1689
unclerico: 0.271
ESquared: 0.0907

But these numbers really change when there are a lot of rows or a lot of values. I'll post on this later.

 
Thankyou all for your replies.

I have decided to go with gmmastros's suggestion. This is because the other two solutions do not always produce the desired effect:
Code:
set nocount on

declare @tbl table(
tbl_id int IDENTITY NOT NULL PRIMARY KEY,
PersonCode varchar(6) NOT NULL,
DataValue int)

INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 18)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 7)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 14)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 92)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 12)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 15)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 3)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 6)
INSERT @tbl(PersonCode, DataValue) VALUES ('ABC123', 68)
INSERT @tbl(PersonCode, DataValue) VALUES ('DEF456', 5)
INSERT @tbl(PersonCode, DataValue) VALUES ('DEF456', 6)
INSERT @tbl(PersonCode, DataValue) VALUES ('DEF456', 78)
INSERT @tbl(PersonCode, DataValue) VALUES ('GHI789', 63)
INSERT @tbl(PersonCode, DataValue) VALUES ('GHI789', 15)
INSERT @tbl(PersonCode, DataValue) VALUES ('GHI789', 148)
INSERT @tbl(PersonCode, DataValue) VALUES ('GHI789', 32)
INSERT @tbl(PersonCode, DataValue) VALUES ('GHI789', 44)
INSERT @tbl(PersonCode, DataValue) VALUES ('GHI789', 44)
INSERT @tbl(PersonCode, DataValue) VALUES ('GHI789', 32)
suggestion 1:
Code:
Declare @Temp Table(Rowid Int Identity(1,1), PersonCode VarChar(6), DataValue Int)

Insert Into @Temp(PersonCode, DataValue)
Select PersonCode, DataValue
From   @tbl
Order By PersonCode, DataValue DESC

Select T.PersonCode,  T.DataValue
From   @Temp T
   Inner Join
   (    Select   Min(RowId) As MinRowId, PersonCode
        From     @Temp
        Group By PersonCode
   ) As A
   On T.PersonCode = A.PersonCode
   And T.RowId Between A.MinRowId And A.MinRowId + 4
gives:
Code:
PersonCode DataValue   
---------- ----------- 
ABC123     92.00
ABC123     68.00
ABC123     18.00
ABC123     15.00
ABC123     14.00
DEF456     78.00
DEF456     6.00
DEF456     5.00
GHI789     148.00
GHI789     63.00
GHI789     44.00
GHI789     44.00
GHI789     32.00

suggestion 2:
Code:
SELECT a.personcode, a.datavalue
  FROM @tbl a
 WHERE a.datavalue IN
    (SELECT TOP 5 b.datavalue
           FROM @tbl b
          WHERE b.personcode = a.personcode
          ORDER BY b.datavalue desc )
 ORDER BY a.personcode, a.datavalue DESC
gives
Code:
personcode datavalue   
---------- ----------- 
ABC123     92.00
ABC123     68.00
ABC123     18.00
ABC123     15.00
ABC123     14.00
DEF456     78.00
DEF456     6.00
DEF456     5.00
GHI789     148.00
GHI789     63.00
GHI789     44.00
GHI789     44.00
GHI789     32.00
GHI789     32.00
which is not exactly as required.

suggestion 3 also did not always produce the required result:
Code:
SELECT  a.personcode, a.datavalue
FROM
   @tbl A
   INNER JOIN @tbl B ON A.personcode = B.personcode AND A.datavalue <= B.DataValue
GROUP BY a.personcode, a.datavalue
HAVING
   Count(*) <= 5
 ORDER BY a.personcode, a.datavalue DESC
*/
gives:
Code:
personcode datavalue   
---------- ----------- 
ABC123     92.00
ABC123     68.00
ABC123     18.00
ABC123     15.00
ABC123     14.00
DEF456     78.00
DEF456     6.00
DEF456     5.00
GHI789     148.00
GHI789     63.00

Thankyou again for all your help.
 
You're right! I messed up.

unclerico's code can be tweaked as so to give the correct results:

Code:
SELECT a.PersonCode, a.DataValue
   FROM tbl1 a
   WHERE [b][blue]a.id[/blue][/b] IN (
      SELECT TOP 5 [b][blue]b.id[/blue][/b]
      FROM tbl1 b
      WHERE b.PersonCode = a.PersonCode
      ORDER BY b.DataValue desc
   )
ORDER BY a.PersonCode, a.DataValue DESC
And mine can be adjusted as so (my updated suggestion 3):

Code:
SELECT a.PersonCode, a.DataValue
FROM
   tbl1 A
   INNER JOIN tbl1 B ON
      A.PersonCode = B.PersonCode
      AND (
         A.DataValue < B.DataValue
         OR (A.DataValue = B.DataValue AND A.id <= B.id)
      )
GROUP BY a.PersonCode, a.DataValue, a.id
HAVING Count(*) <= 5
ORDER BY a.PersonCode, a.DataValue DESC
Note that if you wanted to include items of equal rank instead of arbitrarily cutting them off (displaying more than 5 when there are ties with the last item), you could do:

Code:
SELECT a.PersonCode, a.DataValue
FROM
   tbl1 A
   INNER JOIN tbl1 B ON A.PersonCode = B.PersonCode AND A.DataValue [b][blue]<[/blue][/b] B.DataValue
GROUP BY a.PersonCode, a.DataValue, a.id
HAVING Count(*) <= [b][blue]4[/blue][/b]
ORDER BY a.PersonCode, a.DataValue DESC
Let's see how they compare performance-wise, in execution plan query cost. Numbers in left column are "people, average number of values per person". Lower numbers are better. Winners in bold.

[tt] Suggestion#: 1 2 3
Given data 0.1698 0.305 0.0905
1000, 5.444 0.285 93.8 0.0519
100, 509.35 4.063 976.0 116.0[/tt]

So you can see that solution 2 scales directly with the size of the table. Solution 3 scales in some way with the average number of values per person. And solution 1 scales based on table size, too.

And if you want code to fill the table with lots of dummy data, here you go:

Code:
DECLARE
   @PersonCount int,
   @MaxValueCount int,
   @valueCount int,
   @PersonCode varchar(6)

SET @PersonCount = 1000 --number of people to add
SET @MaxValueCount = 10 -- maximum number of values per person to add (at least 1)

WHILE @PersonCount > 0 BEGIN
   SET @PersonCode = ''
   WHILE Len(@PersonCode) < 6 BEGIN
      SET @PersonCode = @PersonCode + Char(65 + Rand() * 26)
   END
   WHILE EXISTS (SELECT * FROM tbl1 WHERE PersonCode = @PersonCode) BEGIN
      SET @PersonCode = ''
      WHILE Len(@PersonCode) < 6 SET @PersonCode = Char(65 + Rand() * 26)
   END
   SET @ValueCount = Convert(int, Rand() * @MaxValueCount) + 1
   WHILE @ValueCount > 0 BEGIN
      INSERT tbl1 SELECT @PersonCode, Rand() * 1234567890
      SET @ValueCount = @ValueCount - 1
   END
   SET @PersonCount = @PersonCount - 1
END

SELECT
   Count(*),
   Av = Avg(convert(float, Cnt))
FROM (
   SELECT personcode, Cnt = count(*) FROM tbl2 group by personcode
) X
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top