Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

top keyword with group by

Status
Not open for further replies.

fikir

Programmer
Jun 25, 2007
86
0
0
One thing is troubling me

I have a table

delcare @tbl table (depid int, empid int)

insert into @tbl
select 1, 11
union all select 1, 12
union all select 1, 13
union all select 1, 14
union all select 2, 20
union all select 2, 23
union all select 3, 30

I want to select any top max 2 empid for for each depid
i.e two record for each depid if it has two or more records otherwise one

in this case the recordset looks like

depid empid
1 11
1 12
2 20
2 23
3 30

can we use top keyword with group by

Thanks,
 
You cannot select "top per group" with any native syntax. There are many threads in this forum dealing with that exact problem, though.

In SQL 2005 it is fairly easy with the new ranking functions such as ROW_NUMBER(), although for extremely large data sets these functions can be poor performers.

In SQL 2000 to calculate which rows belong in your result, you'll need to do some kind of self-join, or correlated subquery, or insert to an identity-column bearing temp table.

Code:
SELECT
   depid,
   empid
FROM
   emp e
WHERE
   empid <= ANY (SELECT TOP 2 x.empid FROM emp x WHERE e.depid = x.depid ORDER BY x.empid)
For large datasets, the temp table with an identity-column will perform MUCH better. For small ones, this should be fine.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.

[sub]The best part about anything that has cheese is the cheese.[/sub][/color]
 
yes, you can select "top X per Y" easily
Code:
select t1.deptid
     , t1.empid
  from daTable as t1
inner
  join daTable as t2
    on t1.deptid = t2.deptid
   and t1.empid  <= t2.empid
group
    by t1.deptid
     , t1.empid
having count(*) <= 2
order
    by t1.deptid
     , t1.empid desc




r937.com | rudy.ca
 
Rudy, that self-join does work, but with recent experimentation I've found that for even medium-sized result sets, performance-wise it's far superior to use a temp table with an identity column for such calculations.

I realize this uses tempdb and has space usage considerations. But at a certain point, with a not very large resultset, the I/O and CPU usage of the self-join becomes so great that this method just blows that out of the water. It would go something like this:

Code:
SELECT
   seq = identity(int, 1, 1),
   depid,
   empid
INTO #t
FROM emp e
ORDER BY
   depid,
   empid

SELECT
   t.depid,
   t.empid
FROM
   #t t
   INNER JOIN (
      SELECT depid, sstart = Min(seq) FROM #t GROUP BY depid
   ) x on t.depid = x.depid
WHERE
   seq - sstart < 2
This is untested.

All these methods including my original one with the TOP 2 correlated subquery should be tested for which actually is better. It will make a difference what other columns are in the emp table. Fewer columns will magnify the difference between solutions that have ~ one read per row and solutions that have ~ one read per page. Many many columns will reduce the difference between the queries.
 
Or, for sufficiently smaller result sets, you could use a table variable. ;-)

< M!ke >
[small]"Oops" is never a good thing.[/small]
 
True, a table variable works fine. In both cases a primary key on the work table might help (but this is not always true).
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top