Returning top n% of each group in a table...

biltonguy · Apr 21, 2006

Hey there

Ive been struggling with this one for a few days now..

Lets say I have one huge employee table (which includes the groupID and salary) containing everyone from managers to engineers to maintenance.

I need a query that will return the records of the top 5% highest salaries of each group. I cannot afford to query each group separately as the table contains +- 30k records, and there are about 27 groups - it simply takes too long to execute.

Example of output im looking for:

GroupID | Salary

Engineer 25000
Engineer 23500
Manager 39999
Manager 30400
Manager 30900
Manager 30050
Accounts 89228
Accounts 89444
Accounts 89999
etc...

If anyone has any idea how to achieve this please let me know.

Thanks a lot

macleod1021 · Apr 21, 2006

SELECT TOP 5 PERCENT GROUPID, SALARY
FROM tblYOURTABLE
ORDER BY SALARY DESC

macleod1021 · Apr 21, 2006

oh wait...just read you want for each group. Give me a min to fix that

vongrunt · Apr 21, 2006

Define top 5%:

a) if there are 100 rows per group, retrieve 5% of 100 = 5 rows per group
b) if highest value in a group is $100,000 retrieve rows with Salary >= $95,000
c) if total sum of salaries is $1,000,000 retrieve only top salaries until sum reaches 5% of that ($50,000)

?

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

vongrunt · Apr 21, 2006

And of course ol' question:

SQL2000 or 2005?

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

druer · Apr 21, 2006

I cannot afford to query each group separately as the table contains +- 30k records, and there are about 27 groups - it simply takes too long to execute.

Assuming the proper indexes to satisfy the query that should be cake and shouldn't take long at all. If the right query is taking too long then the problem is likely with the lack of indexes to resolve it.

Dalton

biltonguy · Apr 21, 2006

Define top 5%:

a) if there are 100 rows per group, retrieve 5% of 100 = 5 rows per group
b) if highest value in a group is $100,000 retrieve rows with Salary >= $95,000
c) if total sum of salaries is $1,000,000 retrieve only top salaries until sum reaches 5% of that ($50,000)"

Mostly a).. it will retrieve 5 rows per group but those rows will contain the 5 highest salaries in the group.

I'm using 2005

vongrunt · Apr 21, 2006

Then write function a la macleod1021 and use APPLY() thing over distinct list of GroupIDs. Do you have separated table Groups or something?

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

macleod1021 · Apr 21, 2006

Well, I haven't used 2005 much, but I did find this website pretty quickly using google. It seems to be waht you need.

http://www.4guysfromrolla.com/webtech/010406-1.shtml

r937 · Apr 21, 2006

dalton, one query is going to be faster than 27 separate queries any day

r937.com | rudy.ca

vongrunt · Apr 21, 2006

Heh... I just wrote this:

Code:

select A.*
from
(	select GroupID, Salary, row_number() over( partition by GroupID order by Salary desc) as GroupRank
	from Salary
) A
inner join
(	select GroupID, count(*) as salaryCount
	from Salary
	group by GroupID
) B 
on A.GroupID = B.GroupID 
where A.GroupRank <= B.salaryCount * 0.05

It should run faster than APPLY() query.

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

biltonguy · Apr 24, 2006

Thanks for the responses guys, I tried vongrunt's query out but i get this error "'row_number' is not a recognized function name."

Very strange ive looked row_number up and it is a valid function. Any idea why it might be saying this?

Forgive me if i'm a bit noob. Just out of college...

biltonguy · Apr 24, 2006

Sorry just realised the db is on sql server 2000 (which doesnt support row_number). Is there a simliar function in 2000?

vongrunt · Apr 24, 2006

> Is there a simliar function in 2000?

Nope, but there are other (less efficient) ways to do the same in 2000.

------
[small]select stuff(stuff(replicate(' ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Returning top n% of each group in a table...

biltonguy

Programmer

macleod1021

Programmer

macleod1021

Programmer

vongrunt

Programmer

vongrunt

Programmer

druer

Programmer

biltonguy

Programmer

vongrunt

Programmer

macleod1021

Programmer

r937

Technical User

vongrunt

Programmer

biltonguy

Programmer

biltonguy

Programmer

vongrunt

Programmer

Similar threads

Part and Inventory Search

Sponsor