Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

I need to select the last row per group... please help.

Status
Not open for further replies.

sd0t1

IS-IT--Management
Mar 14, 2007
131
US
Hi. Any help will be much appreciated.

I have a a table with the following fields (actual tabale is much larger but for example sake I shortened it):

offender_id
program_id
counselor_id
case_no
attended
missed
total_attended
total_missed

When the Program link is clicked the table needs to display the last row inserted for each offender_id.

I am not able to figure out how to write the sql statement. I'm pretty new to SQL and just now figured out how to link tables.

Here is a copy of the actual statement that I need to write the statement for.

$sql = "SELECT schedule.oid, offender.f_name, offender.l_name, offender.ssn, schedule.case_no, schedule.attended, schedule.total_attended, schedule.missed, schedule.total_missed, schedule.module_completed, schedule.total_module_completed,schedule.pid, schedule.cid, schedule.poid, schedule.county_id, schedule.completed, schedule.date, schedule.day, schedule.time, program.program_name, program.day, program.time ".
"FROM offender, schedule, program, counselor ".
"WHERE offender.id = schedule.oid AND program.id = schedule.pid AND counselor.id = schedule.cid and schedule.pid = '$pid' ";
 
I had planned on using the highest 'id' for each individual oid (offender id).
Is that not a good idea?
 
you say you want to display "the last row for each offender"

the "for each offender" part would involve only rows with the same offender id

so choosing the highest id out of all the rows with the same offender id doesn't make sense

r937.com | rudy.ca
 
Sorry for not being more clear. Let me try to make an example of the table.

id, offender_id, program_id, attended, total_attended missed
1 3 10 no 0 1
2 4 10 yes 1 0
3 3 10 yes 1 1
4 4 10 yes 2 0
5 3 10 no 1 2
6 4 10 no 2 1

What I need to select is the following.

offender id 3 last row which is row id 5
offender id 4 last row which is row id 6

I have many more fields, but I tried to condense to make the example understandable. Plus I have a joins, and I'm not sure if that makes a difference.

I appreciate your help on this.
 
Ok, I got part of it to work. I was able to make it pull 1 row per offender, but it's the first row, not the last, in spite my DESC tag at the end.
here is the code:
SELECT MAX(schedule.oid), offender.f_name, offender.l_name, offender.ssn, schedule.case_no, schedule.attended, schedule.total_attended, schedule.missed, schedule.total_missed, schedule.module_completed, schedule.total_module_completed,schedule.pid, schedule.cid, schedule.poid, schedule.county_id, schedule.completed, schedule.date, schedule.day, schedule.time, program.program_name, program.day, program.time ".
"FROM offender, schedule, program, counselor ".
"WHERE offender.id = schedule.oid AND program.id = schedule.pid AND counselor.id = schedule.cid and schedule.pid = '$pid' GROUP BY schedule.oid ORDER BY schedule.oid DESC
 
I have offenders with multiple rows in a table (schedule). I need to select the last row entered for each offender. My current query below selects all the fields from the first (not the last)row However it does select the correct last data from the field that I specify "MAX()".

How do I get the MAX function to return the entire last row entered?

Here is my current query: When I run the query in my sql editor

SELECT MAX(schedule.id), schedule.id, schedule.oid, offender.f_name, schedule.cur_date, offender.l_name, offender.ssn, schedule.case_no, schedule.attended, schedule.total_attended, schedule.missed, schedule.total_missed, schedule.module_completed, schedule.total_module_completed,schedule.pid, schedule.cid, schedule.poid, schedule.county_id, schedule.completed, schedule.date, schedule.day, schedule.time, program.program_name, program.day, program.time ".
"FROM offender, schedule, program, counselor ".
"WHERE offender.id = schedule.oid AND program.id = schedule.pid AND counselor.id = schedule.cid and schedule.pid = '$pid' GROUP BY schedule.oid

For an example, I'll use my example table from above.

id, offender_id, program_id, attended, total_attended missed
1 3 10 no 0 1
2 4 10 yes 1 0
3 3 10 yes 1 1
4 4 10 yes 2 0
5 3 10 no 1 2
6 4 10 no 2 1

Instead of returning

5 3 10 no 1 2
6 4 10 no 2 1

I get:

5 3 10 no 0 1
6 4 10 yes 1 0
 
try this:

think about what you want in the GROUP BY...

"i want one row per _____"

so far, this has been schedule.oid

now, go through your SELECT clause and make sure that every mention of any column that ~isn't~ one of the columns in the GROUP BY, has to be

(1) inside an aggregate function, or
(2) fully functionally dependent on the GROUP BY column(s)

for (2), if you have GROUP BY schedule.oid, then any column is okay, provided there is only one row per schedule.oid

since this is mysql, (2) is allowed, but you cannot do that in any other database system


r937.com | rudy.ca
 
I did read that in the articles you recommended to the other guy that had a similar problem ( GROUP BY and HAVING with Hidden Fields and Debunking GROUP BY Myths).
The thing is I don't fully understand what you mean when you say "fully functionally dependent on the Group By columns".

Which ones are the GROUP BY columns?

What does fully functionally dependent on the GROUP BY column?
Do you mean the fields in the row that have calculations rather than a name?
 
fully functionally dependent" means the same thing as it does in data modelling and normalization

the GROUP BY columns in your case are just one: schedule.oid

take each column in your SELECT clause and ask yourself if there is any chance that there can be more than one value for each schedule.oid

r937.com | rudy.ca
 
Ok, there are many.
My offender id, total_attended, total_missed, total_module_completed, day_missed, and completed are all fields that will need the total from the last row inserted and could possibly have a different value each time a new row is inserted.

The other fields, program id, day, time, counselor id etc. will be the same in each row.

Ok, so my "fully functionally dependent" fields are the ones that will possibly get new values each time a new row is inserted.... right?

So to answer your question, yes there are rows that can be more than one value per offender id (schedule.oid).

I'm sorry to say but I'm still extremely lost here.
Now that I've identified those fields, what do I do?
 
Getting a bit confused myself by the post but I think you are after a simple sub suery,,
Code:
use test;
drop table if exists test.offences;
create table offences
	(
		id int(8) unsigned primary key auto_increment,
		offender_id int(8) unsigned not null,
		offence_date date,
		program_id int(8) unsigned not null,
		attendedyn enum('Yes','No'),
		attended int(8) unsigned not null,
		missed int(8) unsigned not null
	);
insert into test.offences (offender_id,offence_date,program_id,attendedyn,attended,missed)
values
	(3,'2001-02-06',10,2,0,1),
	(4,'2001-03-26',10,1,1,0),
	(3,'2001-04-16',10,1,1,1),
	(4,'2001-05-06',10,1,2,0),
	(3,'2001-06-26',10,1,1,2),
	(4,'2002-07-16',10,1,2,1)
	;

select * from test.offences;

select
	*
from test.offences
where id in
(select max(id) from test.offences group by offender_id);
 
I do not know if I missing the boat here but it seem that a simple ORDER BY offender_id, id DESC and read the first row should do it.

The fact that you have focused on the last row is your problem. The way I see it, you have:

id, offender_id, program_id, attended, total_attended missed
1 3 10 no 0 1
2 4 10 yes 1 0
3 3 10 yes 1 1
4 4 10 yes 2 0
5 3 10 no 1 2
6 4 10 no 2 1

and run query ordered by offender_id, id desc you should have results like

id, offender_id, program_id, attended, total_attended missed
5 3 10 no 1 2
3 3 10 yes 1 1
1 3 10 no 0 1
6 4 10 no 2 1
4 4 10 yes 2 0
2 4 10 yes 1 0

Then in your processing loop, check for offender_id and make sure that you process its first instance.

Give that a shot and see what happens.

Hope this helps!



--
SouthBeach
The good thing about not knowing is the opportunity to learn - Yours truly, 2008.
 
southbeach, of course that will work, but it returns the entire table, not just the rows desired

in general the better strategy is to return only the rows required

"latest X per group" is a common requirement and has several SQL solutions

r937.com | rudy.ca
 
Thanks for all the help guys, I'll show you how I accomplished my task. I'm not exactly sure If I used best practices or not. Maybe someone can advise on that.

Basically I just made two queries back to back. The first query grabs the MAX id for each offender.
The next query loops through each MAX id and echo's that individual row as I need it.

But I also put a piece of code at the begining of the loop that says,

IF ($row['total_module_completed'] < $row['complete'])

this only returns the rows that have not completed the set amount of treatment classes.

I think I should be able to do all this in more effecient manor, but couldn't get it to work.

Here it is.


$getid = "Select MAX(schedule.id) AS newid, schedule.oid from schedule where schedule.pid = '$pid' group by schedule.oid";
$getresult = mysql_query($getid, $user_login) or die(mysql_error());
while ($getrow = mysql_fetch_array($getresult))
{


$sql = "SELECT schedule.id, schedule.oid, offender.f_name, schedule.cur_date, offender.l_name, offender.ssn, schedule.case_no, schedule.attended, schedule.total_attended, schedule.missed, schedule.total_missed, schedule.module_completed, schedule.total_module_completed,schedule.pid, schedule.cid, schedule.poid, schedule.county_id, schedule.completed, schedule.date, schedule.day, schedule.time, program.program_name, program.day, program.time, program.complete
FROM offender, schedule, program, counselor
WHERE offender.id = schedule.oid AND program.id = schedule.pid AND counselor.id = schedule.cid and schedule.id = $getrow[newid] ";
$sqlresult = mysql_query($sql, $user_login) or die(mysql_error());

$row = mysql_fetch_assoc($sqlresult);

if ($row['total_module_completed'] < $row['complete'])
{

echo's the row on the screen the way I need it
}
}

 
any time you have a query inside a loop, with a value that comes from another query, you should combine them into a single query

try this and let me know if it produces the same results
Code:
SELECT schedule.id
     , schedule.oid
     , offender.f_name
     , schedule.cur_date
     , offender.l_name
     , offender.ssn
     , schedule.case_no
     , schedule.attended
     , schedule.total_attended
     , schedule.missed
     , schedule.total_missed
     , schedule.module_completed
     , schedule.total_module_completed
     , schedule.pid
     , schedule.cid
     , schedule.poid
     , schedule.county_id
     , schedule.completed
     , schedule.date
     , schedule.day
     , schedule.time
     , program.program_name
     , program.day
     , program.time
     , program.complete
  FROM ( SELECT MAX(id) AS newid
              , oid 
           FROM schedule 
          WHERE pid = '$pid' 
         GROUP 
             BY oid ) AS maxes
INNER
  JOIN schedule
    ON schedule.id = maxes.newid
INNER
  JOIN offender
    ON offender.id = maxes.oid
INNER
  JOIN program
    ON program.id = schedule.pid
INNER
  JOIN counselor
    ON counselor.id = schedule.cid
one thing i want to point out is that the old style joins, where you list the tables in the FROM clause and put the join conditions into the WHERE clause, should be replaced with explicit JOIN syntax, which is much easier to write, understand, and to debug later

:)

r937.com | rudy.ca
 
Hey r937, this worked great Thank You very much.

I don't quite understand it and would be very appreciative if you could give a brief explanation of the code.

I am new to using joins and aliases. I do understand the normal JOIN (JOIN counselor ON counselor.id = schedule.cid)

But the "AS maxes" is confusing. I guess I really don't have a good understanding of aliases. Can you explain the part of the code below please?

FROM ( SELECT MAX(id) AS newid
, oid
FROM schedule
WHERE pid = '$pid'
GROUP
BY oid ) AS maxes
INNER
JOIN schedule
ON schedule.id = maxes.newid
 
the FROM clause specifies the data sources for the query

usually, it is one or more tables, and if more than one, they must be properly joined

as well as a table, the FROM clause can also specify a subquery as a data source, and such a subquery is known either as a derived table or inline view

that's the neat part about SQL -- wherever you can use a table, you can use a subquery, which, as you know, produces a tabular result

the derived table subquery must be given an alias name, in order to allow reference to be made to its columns

run the subquery by itself, and you will see which rows it produces -- now just imagine these rows joined back to the schedule table, using the [tt]newid[/tt] column of the derived table to make the join

in effect, this filters out all the schedule rows except those which match one of the "max" ids

:)

r937.com | rudy.ca
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top