Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Lost records with LEFT JOIN and GROUP BY 1

Status
Not open for further replies.

OsakaWebbie

Programmer
Feb 11, 2003
628
JP
I have a table pw_song with song information (primary key autoincrement SongID) and table pw_usage which has SongID, EventID and UseDate (all NotNull fields). A song may or may not have any pw_usage records associated with it. I use PHP to assemble a query from many different dynamic parts to list songs that fit any of a number of criteria unrelated to usage - that has been working fine. Now I'm trying to add functionality to also tell me the last date a song was used for a given event, if any. Without that, the select statement looks something like this (the where clause and even number of tables may vary, but the behavior is the same):

Code:
SELECT pw_song.SongID,Title,Tagged,OrigTitle,Tempo,SongKey,LEFT(Lyrics,INSTR(Lyrics,' ')-1) AS FirstLine FROM pw_song WHERE Title LIKE '%praise%' OR OrigTitle LIKE '%praise%' ORDER BY OrigTitle

To that, I dynamically add what I thought would just give me the last date used (or null if never used):
Code:
SELECT pw_song.SongID,Title,Tagged,OrigTitle,Tempo,SongKey,LEFT(Lyrics,INSTR(Lyrics,' ')-1) AS FirstLine,MAX(pw_usage.UseDate) AS Last FROM pw_song LEFT JOIN pw_usage ON pw_song.SongID=pw_usage.SongID WHERE Title LIKE '%praise%' OR OrigTitle LIKE '%praise%' AND (pw_usage.EventID=2 OR ISNULL(pw_usage.EventID)) GROUP BY pw_song.SongID ORDER BY OrigTitle
With this query sometimes I get fewer records overall than with the first query. Checking my data in just a few example cases, it looks like the records disappear when there are pw_usage records for that song but none for that event; if there are no pw_usage records at all it works fine, and if there is at least one record for the requested event it works fine. That would seem to point to the "(pw_usage.EventID=2 OR ISNULL(pw_usage.EventID))" as the culprit, but I'm not very experienced at trying to do aggregate functions with a many-many relationship table, so I don't know the right way to limit the pw_usage records to a certain EventID but not limit the pw_song records. I tried using HAVING instead of part of the WHERE, but then it complained that EventID was an unknown column. Can someone help?
 
you have

WHERE x OR y AND ( a OR b )

i'm pretty sure you need to add some more parentheses

because right now, that's being evaluated as

WHERE x OR ( y AND ( a OR b ) )

which is probably not what you want, since x alone will evaluate as true

r937.com | rudy.ca
 
I see what you mean, and for GP I will add parentheses for that particular criteria, as it might be causing me to get the wrong date in some cases. But adding parentheses does not make the missing records appear, so my basic problem is still there. Also, that is only one example of the basic song criteria - others don't have an OR but they act the same when I add the MAX/LEFT JOIN/GROUP BY stuff (losing records that have other event usage data but none for the event in the WHERE clause). Any other thoughts?
 
I have tested pieces of my query and found what is causing the problem, but I still don't know how to build a query that does what I need. I don't think my need is all that rare, so surely someone else has run into it. If I remove the aggregate function, the GROUP BY, and the part of the WHERE clause that restricts the EventID, and just return the EventIDs, I can see what's going on:
Code:
SELECT pw_song.SongID,Title,EventID FROM pw_song LEFT JOIN pw_usage ON pw_song.SongID=pw_usage.SongID WHERE OrigTitle LIKE '%praise%' ORDER BY SongID
If there are no records at all in pw_usage for a given SongID, a single record is returned for the song with the EventID as null, but if there are any pw_usage records at all for that song, there is no null record but only the record(s) for all occurrences in pw_usage. So if there is at least one record with EventID=1, for example, unless there is also a record with EventID=2, my original WHERE clause will eliminate the song from appearing (there is no record with either EventID=2 or null).

So, is there a way to do this inside the MAX function somehow, rather than including my faulty logic in the WHERE clause? If I were to start with the above query and added the GROUP BY, is there a way in the SELECT part of the statement to return the maximum UseDate for EventID=2, or something else (NULL or some text) whenever there are no EventID=2 records, regardless of whether the song was used in any other event?
 
I'll appeal one more time for help with this. I have tried and tried to figure out a way to do this, and it doesn't seem a rare thing to want to do, but I haven't gotten anywhere. I'm starting to think that LEFT JOIN is part of the problem - it seems to be designed mostly for many-one relationships, giving unpredictable results in a many-many situation. So let's start over without any assumptions of how the query will be constructed.

I'll break it down to its simplist elements. I have two tables, "song" and "usage", which have a many-many relationship - a song can be used multiple times in multiple events, but only once per event and date (i.e. the primary key of the usage table is SongID+EventID+UseDate). I want to be able to query for all songs, and as one of the returned fields, I want the last date the song was used for a given event. If it has not been used for that event, I still want the song record to be returned, and I don't care what is returned for last date in that case (null, zero date, or anything else I can recognize - I will ultimately convert it to the string "never" in my PHP code). How can I do this?
 
whenever you have a LEFT OUTER JOIN, any filter conditions on fields from the right table must go into the ON clause, not the WHERE clause

as for your latest date per usage, this can only be done with a subquery in the ON clause, or else with a self-join of the usage table to itself

could you please show your latest query again, and please make sure each column is properly qualified so that it's clear which table it comes from

r937.com | rudy.ca
 
could you please show your latest query again, and please make sure each column is properly qualified so that it's clear which table it comes from
Okay, here it is. As you can see, the only condition in the where clause relates to pw_song (the left table in the join).
Code:
SELECT pw_song.SongID,pw_song.Title,pw_usage.EventID FROM pw_song LEFT JOIN pw_usage ON pw_song.SongID=pw_usage.SongID WHERE pw_song.OrigTitle LIKE '%praise%' ORDER BY pw_song.SongID
The above query returns at least one record for each song, and in the case where there are no pw_usage records, UseDate is null. The problem is finding a method of getting the maximum UseDate for a given EventID that works for all cases:
(1) Newest UseDate overall happens to be for the desired EventID: easy
(2) Newest UseDate for desired EventID is not newest overall: need the correct date returned
(3) No pw_usage records at all: return song record, but with no date
(4) pw_usage records exist only for other events: return song record, but with no date
My original attempt at this failed on point #4.
as for your latest date per usage, this can only be done with a subquery in the ON clause, or else with a self-join of the usage table to itself
The installation of MySQL that this is running on is 4.0.18, so I can't do subqueries. Can you tell me more about what you mean by a "self-join" of the usage table to itself"? I don't see how that will help, but that's beyond my experience.
 
your latest query indeed has only one condition, and it's on the left table

but earlier, you said
I want to be able to query for all songs, and as one of the returned fields, I want the last date the song was used for a given event. If it has not been used for that event, I still want the song record to be returned, and I don't care what is returned for last date in that case
do you still want to specify the event, or do you just want the latest usage over all events for that song?

r937.com | rudy.ca
 
I always want to specify a single event - getting the latest usage regardless of event would be useless to me. The conditions for which songs are returned will change constantly with what the user asks for, but that code is already all working fine, and not the issue here. I said at one point, "I want to query for all songs..." simply because I wanted to get the song conditions out of the discussion - that's not the problem. If you look carefully, you'll see that I have always said "for a given event" in my descriptions, and that condition is the very crux of the issue, the reason this query is giving me fits.
 
If you look carefully, you'll see that I have always said "for a given event" in my descriptions, and that condition is the very crux of the issue, the reason this query is giving me fits.
yes, but if you look closely, you'll see that i asked for your latest query, and there was nothing in it which identified which event you want


:)

okay, try this --
Code:
select pw_song.SongID
     , pw_song.Title
     , usage1.UseDate 
  from pw_song 
left outer
  join pw_usage      as usage1
    on pw_song.SongID
     = usage1.SongID 
   and usage1.Event = 937
left outer
  join pw_usage      as usage2
    on pw_song.SongID
     = usage2.SongID 
   and usage2.Event = 937   
 where pw_song.OrigTitle like '%praise%' 
group
    by pw_song.SongID
     , pw_song.Title
     , usage1.UseDate 
having usage1.UseDate 
     = max(usage2.UseDate)      
order 
    by pw_song.SongID

r937.com | rudy.ca
 
Well, I don't totally follow what the nested JOIN is supposed to accomplish, but I tried it. However, your query loses even more records than my first attempt - I only get songs that have pw_usage records with the designated EventID. I still need it to return the rest of the songs that match the WHERE clause, but with no date.
yes, but if you look closely, you'll see that i asked for your latest query, and there was nothing in it which identified which event you want
That one query was not an attempt at a final solution, but just an interim step to reveal all the related records - it didn't even have the GROUP BY on it. It was that query (without the GROUP BY or any restriction on EventID) that allowed me to realize why my first attempt didn't work, because its results revealed when a null UseDate record was returned for a song and when it wasn't. I was hoping that starting with that query, someone might have an idea of how to use some tricky stuff inside a MAX function to look at just dates for the desired event, because I can't figure out how to write a WHERE or ON clause that still returns at least one record for each song no matter what the situation with usage records.
 
tricky stuff inside a MAX function" = subquery

"returns at least one record for each song" = left outer join

r937.com | rudy.ca
 
I'm hoping that "= subquery" doesn't mean that's the only way, because I can't do subqueries. As for the left outer join, I'm already doing that - the problem is coming up with a way to get the MAX to do the right thing with the records the LEFT JOIN returns.

If there really isn't a way to do what I need in the query itself (which surprises me - it just doesn't seem like that strange a request of a database), I did think of a tacky bandaid solution. My original attempt, which included the condition "AND (pw_usage.EventID=2 OR ISNULL(pw_usage.EventID)" looks like it would work if I alter the pw_usage table to allow nulls in the EventID and UseDate fields, and then insert a dummy record for every song, with a null EventID and UseDate. (Hmm, can a primary key field allow nulls? If not, I guess I would need to use zero or something instead.) I would need to add code to insert the dummy record whenever a new song is added, and change some other queries against the pw_usage table sprinkled throughout my code to exclude the dummy records. It seems like an extremely sloppy way of "solving it", but is it the only way?
 
i've been doing sql for a very long time and i'm pretty confident that my query with the double outer join will do what you want

... except i forgot, you need to add or usage1.UseDate is null to the HAVING clause

:)

what you suggested with dummy rows and zero keys sounds like way too much of a hack/kludge



r937.com | rudy.ca
 
Ah, yes, now I see that your HAVING clause is the key to why the double JOIN. Okay, using your code with the IS NULL added, I seem to get the results I need when I just stick it in a free-form query page I have, but when I incorporated it into my PHP code, I must have done something wrong, because now it complains thus:
SQL Error 1054: Unknown column 'usage1.UseDate' in 'having clause'
The SQL statement that my code assembled (based on a simple request for all songs, with last use info for Event #1) is this:
Code:
SELECT pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, LEFT(Lyrics,INSTR(Lyrics,'\n')-1) AS FirstLine, usage1.UseDate AS Last FROM pw_song LEFT OUTER JOIN pw_usage AS usage1 ON pw_song.SongID=usage1.SongID AND usage1.EventID=1 LEFT OUTER JOIN pw_usage as usage2 ON pw_song.SongID=usage2.SongID AND usage2.EventID=1 WHERE 1 GROUP BY pw_song.SongID,usage1.UseDate HAVING usage1.UseDate=MAX(usage2.UseDate) OR usage1.UseDate IS NULL ORDER BY OrigTitle
I noticed that you included in the GROUP BY all the fields in the SELECT, but I had simplified the fields list for the forum discussion, so I hoped I didn't need to include all the fields in the real case, but do I? Leaving some out doesn't seem likely to cause an error about usage1.UseDate, but I'm in a little over my head with this, so... Do you see what the problem is?
 
yes, you must put all non-aggregate columns that are in the SELECT list (i.e. all of them) into the GROUP BY

the reason for the error on the HAVING clause is because you have aliased the column name in the SELECT (mysql gets easily confused)

r937.com | rudy.ca
 
Cool, it works! I didn't realize that once you alias a field MySQL no longer recognizes the original flavor; but I also didn't even notice the alias to consider if that might be the problem. Thanks!

The requirement of all non-aggregate columns in the GROUP BY is a little mysterious. It seems that although it is required in SQL generally, in some cases in MySQL you can get away with not listing them all (hints I gathered from comments in the MySQL docs, although I found no direct explanation one way or the other in the actual docs). In my version I don't get an error message, but in the case of this query (I tested this just to learn), a few records don't get returned, and it's a different few than I was missing by my first faulty query, and I can't seem to find a pattern as to why those certain records would get withheld. Oh well, including them all just makes the statement a little longer; it's not hard to do.

Thanks again for your help!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top