Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Returning top n% of each group in a table...

Status
Not open for further replies.

biltonguy

Programmer
Apr 21, 2006
4
ZA
Hey there

Ive been struggling with this one for a few days now..

Lets say I have one huge employee table (which includes the groupID and salary) containing everyone from managers to engineers to maintenance.

I need a query that will return the records of the top 5% highest salaries of each group. I cannot afford to query each group separately as the table contains +- 30k records, and there are about 27 groups - it simply takes too long to execute.

Example of output im looking for:

GroupID | Salary

Engineer 25000
Engineer 23500
Manager 39999
Manager 30400
Manager 30900
Manager 30050
Accounts 89228
Accounts 89444
Accounts 89999
etc...

If anyone has any idea how to achieve this please let me know.

Thanks a lot
 
SELECT TOP 5 PERCENT GROUPID, SALARY
FROM tblYOURTABLE
ORDER BY SALARY DESC
 
Define top 5%:

a) if there are 100 rows per group, retrieve 5% of 100 = 5 rows per group
b) if highest value in a group is $100,000 retrieve rows with Salary >= $95,000
c) if total sum of salaries is $1,000,000 retrieve only top salaries until sum reaches 5% of that ($50,000)

?

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
And of course ol' question:

SQL2000 or 2005?

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
I cannot afford to query each group separately as the table contains +- 30k records, and there are about 27 groups - it simply takes too long to execute.

Assuming the proper indexes to satisfy the query that should be cake and shouldn't take long at all. If the right query is taking too long then the problem is likely with the lack of indexes to resolve it.

Dalton
 
Define top 5%:

a) if there are 100 rows per group, retrieve 5% of 100 = 5 rows per group
b) if highest value in a group is $100,000 retrieve rows with Salary >= $95,000
c) if total sum of salaries is $1,000,000 retrieve only top salaries until sum reaches 5% of that ($50,000)"

Mostly a).. it will retrieve 5 rows per group but those rows will contain the 5 highest salaries in the group.

I'm using 2005 :)
 
Then write function a la macleod1021 and use APPLY() thing over distinct list of GroupIDs. Do you have separated table Groups or something?

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Heh... I just wrote this:
Code:
select A.*
from
(	select GroupID, Salary, row_number() over( partition by GroupID order by Salary desc) as GroupRank
	from Salary
) A
inner join
(	select GroupID, count(*) as salaryCount
	from Salary
	group by GroupID
) B 
on A.GroupID = B.GroupID 
where A.GroupRank <= B.salaryCount * 0.05
It should run faster than APPLY() query.

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Thanks for the responses guys, I tried vongrunt's query out but i get this error "'row_number' is not a recognized function name."

Very strange ive looked row_number up and it is a valid function. Any idea why it might be saying this?

Forgive me if i'm a bit noob. Just out of college... :)
 
Sorry just realised the db is on sql server 2000 (which doesnt support row_number). Is there a simliar function in 2000?
 
> Is there a simliar function in 2000?

Nope, but there are other (less efficient) ways to do the same in 2000.

------
[small]select stuff(stuff(replicate('<P> <B> ', 14), 109, 0, '<.'), 112, 0, '/')[/small]
[banghead]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top