Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

finding top n within postgres sql 3

Status
Not open for further replies.
Jun 15, 1999
18
US
I am trying to write a select statement using Postgres to get the top 3 users in each problem category with the most problem incidents.

For example:
email problems:
mary 5
jason 3
amy 2
beth 1
riley 1

network problems:
bob 7
john 6
fred 4
howard 2
amy 1
sue 1

The results would produce:
email problems:
mary 5
jason 3
amy 2

network problems:
bob 7
john 6
fred 4

This contains the subselect that works so far to get me the count of problems by category sorted by count descending within each category. Now I need to extract out the top 3 from each category.

SELECT subtable.type,subtable.requestor,subtable.foo
FROM
(SELECT type as type,requestor as requestor,count(*) as foo
FROM incident
group by type,requestor
ORDER BY type,count(*) DESC) subtable
??????????????????????;

Any help would be much appreciated! Laura

 
can you tell us the table structure ?
I cant guess all of that from the query itself
Postgresql has a limit and offset clause
in 7.2 you have top specify limit 3, offset 0 for the top 3 columns
 
The informationr resides in just one table where the key is
problem id (number) and fields type (varchar) is the problem category (ie. email problem) and the requestor (varchar) is the name (ie. bob).

My undestanding of limit is that in my case I will get just 3 items from the total result set, as opposed to my needing the top 3 within each group.
 
>My undestanding of limit is that in my case I will get just 3 items from the total result set, as opposed to my needing the top 3 within each group.

Break your problem into its pieces:

1. Turn your above SELECT statement into a view, for each problem area in question. Now you have several virtual tables, each with a column counting aggregating the number of problems, and you want the top 3 from each.

2. Now, add the LIMIT, OFFSET method that newbiepg (no longer such a newbie ;-)) suggests. Since you have an ORDER BY clause on the columns in question, then LIMIT,OFFSET will work just fine, since the results will be ordered from highest to lowest. So simply add LIMIT 3 OFFSET 0 to the end of each query. (Read
3. Now you have a view for each table, which results in several virtual tables, each with exactly 3 rows. But you want all this retrieved on one query? Fine:

4. Create a view which joins all those views. Here's the exercise of the day -- make the output look like this:

Code:
+------------+-------------+--------------+----------------+
| email_user | email_probs | network_user | network_probs  |
+------------+-------------+--------------+----------------+
| mary       | 5           | bob          | 7              |
| jason      | 3           | john         | 6              |
| amy        | 2           | fred         | 2              |
+------------+-------------+--------------+----------------+

And then factor it to look like this:
Code:
+------------+-------------+----------------+-------------+
| user       | email_probs | network_probs  | total_probs |
+------------+-------------+----------------+-------------+
| mary       | 5           | 0              | 5           |
| jason      | 3           | 0              | 3           |
| amy        | 2           | 0              | 2           |
| bob        | 0           | 7              | 7           |
| john       | 0           | 6              | 6           |
| fred       | 0           | 2              | 2           |
+------------+-------------+----------------+-------------+
-------------------------------------------

My PostgreSQL FAQ --
 
Thanks for the feedback! I was hoping I would not have to go the route you suggest since I wanted to be able to have the top 'n' be a variable and not a set value (passed from the reporting program). And I also didn't want to have views for each problem category, since new categories come up all the time. But if I have to I will!

We'll I am off for a week of R&R and maybe another bright idea will come to me. Thanks for the detailed feedback. I'll be referring to it as I build the components when I get back!
 
yes as rycamor said you could create some views
if you have 4 types of problems, that would mean 4 views
you could create a view with the following syntax

create view view1 as
subtable.requestor ,subtable.number where subtable.type = 'network' order by subtable.number desc limit 3 offset 0

this will show the top 3 people with network problems

this can be repeated for the other 3 groups

If you need to add all these columns you can also think of a union


for expert counselling from rycamor
go to
 
Also, remember that if you really need to do something that is difficult with standard SQL, it's time to create a stored procedure.

As of version 7.3, PostgreSQL stored procedures can return sets, exactly the same way regular tables do. Thus, you could create a stored procedure which is called as "
Code:
SELECT problem_areas(4);
", and it would return a set of rows containing the top 4 "problem users". -------------------------------------------

My PostgreSQL FAQ --
 
Thanks for everyone's feedback. Unfortunately I am on Postgres 7.2 which limits me regarding the use of a stored procedure, which would have supported my variability of the 'top n' and the number of problem groups, so I am forced to use the view method. I did learn alot based on your direction as well my attempt to get more familiar with 7.2 functions (and their limitations!)

Kudos to all!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top