Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Combining 2 complex joins into one tapestry 1

Status
Not open for further replies.

OsakaWebbie

Programmer
Feb 11, 2003
628
JP
Thanks to r937's tireless help on two occasions, I have two pieces of amazing code that appear to work well on their own. For reference, the two related threads are and But when I need to combine them in the same query, records start disappearing again. Pardon this long first post, but I feel the need to provide four complete queries straight from my code, for clarity. Note: all unqualified fields are from the pw_song table. Also note that in some cases you may see what looks like silly extra code (like "WHERE 1", GROUP BYs with no purpose, and SUMs of CASES with only one case), but these queries are assembled by PHP code based on a multitude of possible criteria, and this method keeps that code as simple as possible. As far as I can tell, the extra SQL is harmless (maybe slightly slower, but this is not a large DB).

If I ask for all records, with no "filters" (see thread 1107522) and no request for the last usage date (see thread 1130521), it looks like this:
Code:
SELECT pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, LEFT(Lyrics,INSTR(Lyrics,'\n')-1) AS FirstLine FROM pw_song LEFT OUTER JOIN pw_songkey ON pw_song.SongID=pw_songkey.SongID WHERE 1 GROUP BY pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, FirstLine ORDER BY OrigTitle
If I add the code for getting the last usage date, it looks like this:
Code:
SELECT pw_song.SongID, Title, Tagged, OrigTitle, Tempo,SongKey, LEFT(Lyrics,INSTR(Lyrics,'\n')-1) AS FirstLine, usage1.UseDate AS Last FROM pw_song LEFT OUTER JOIN pw_songkey ON pw_song.SongID=pw_songkey.SongID LEFT OUTER JOIN pw_usage AS usage1 ON pw_song.SongID=usage1.SongID AND usage1.EventID=2 LEFT OUTER JOIN pw_usage as usage2 ON pw_song.SongID=usage2.SongID AND usage2.EventID=2 WHERE 1 GROUP BY pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, FirstLine, Last HAVING Last=MAX(usage2.UseDate) OR Last IS NULL ORDER BY OrigTitle
...and I still get all the records in the database, as I ought.

If I use only the filter function, with no "last usage" code, it looks like this:
Code:
SELECT pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, LEFT(Lyrics,INSTR(Lyrics,'\n')-1) AS FirstLine FROM pw_song LEFT OUTER JOIN pw_songkey ON pw_song.SongID=pw_songkey.SongID WHERE 1 GROUP BY pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, FirstLine HAVING SUM(CASE WHEN pw_songkey.KeywordID IN (37) then 1 else 0 end) = 1 ORDER BY OrigTitle
... and I get the correct number of records, all the ones that have a records for KeywordID=37. (The reason I use the "IN" syntax is that I can add other numbers to the list, and even another clause for exclusions. For this example I just had one for simplicity; there's no point in testing more complex cases until I get the simple one working).

But here's where the problem happens: when I try to combine the two functions together, I lose some records, a different number depending on which EventID I try to get the last date for. The query I attempted is this:
Code:
SELECT pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, LEFT(Lyrics,INSTR(Lyrics,'\n')-1) AS FirstLine, usage1.UseDate AS Last FROM pw_song LEFT OUTER JOIN pw_songkey ON pw_song.SongID=pw_songkey.SongID LEFT OUTER JOIN pw_usage AS usage1 ON pw_song.SongID=usage1.SongID AND usage1.EventID=2 LEFT OUTER JOIN pw_usage as usage2 ON pw_song.SongID=usage2.SongID AND usage2.EventID=2 WHERE 1 GROUP BY pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, FirstLine, Last HAVING SUM(CASE WHEN pw_songkey.KeywordID IN (37) then 1 else 0 end) = 1 AND (Last=MAX(usage2.UseDate) OR Last IS NULL) ORDER BY OrigTitle
I'm concerned about how my joins are connecting - both the pw_songkey and usage1 joins work when directly after pw_song, but when I'm doing both of them, they end up kinda daisychained, if you know what I mean. But I don't know if that's a problem or not, or if it is, how to fix it.

If the "join wizard" r937 is listening, do you see how to combine the two ideas you gave me? Or does someone else see what I'm doing wrong and how to fix it?
 
i'm having trouble figuring out what you're trying to do

my honest suggestion here is to pull some related rows out of your tables, and start to lay them out in excel, so that you can see exactly which rows of the 4 tables are connected to which other rows by your joins

start with a few rows, and beside each one, lay out all its keywords, and beside each keyword, lay out all usages

when you have all the detail rows laid out, they should look like this --
Code:
sss kkk uuu uuu
        uuu uuu
        uuu uuu
    kkk uuu uuu
        uuu uuu
        uuu uuu
sss kkk
sss kkk uuu uuu
        uuu uuu
        uuu uuu
    kkk uuu uuu
        uuu uuu
        uuu uuu
then you can mark off the groups, and do the grouping yourself, to see how it works

make sure you include a song with a 37 keyword and no usages, and one with a 37 keyword and multiple usages

your query looks fine, and any further detective work has to be done by analyzing your actual data

that is why you always see in these forums people asking the posters to please show some sample data

in this case, with 4 tables, the sample data will be pretty comprehensive, so you may have to do this on your own

r937.com | rudy.ca
 
Okay, I'll try that, although I think I already know which records I want and which I don't. My problem is that I am not very clear on how the grouping functionality of MySQL works, having not used it extensively, so once I lay out the data and "mark off the groups", I may have trouble figuring out how to get joins and groupings to do what I need. The examples in the documentation are all very simple - there are no examples that resemble my kind of complexity and how one would tie it together. (Plus, lots of examples I find on forums use subqueries, but I'm restricted to a web hoster that has not yet upgraded to 4.1.) But I'll give it a try. Thanks for looking over my query for obvious mistakes or logic gaps - that's what I was expecting that I had, but I guess it's harder than that.
 
Well, I did some charting out of the data, but as I thought, I know what the data is doing but I don't know what SQL to use to deliver the parts I want. There are actually only 3 tables involved, but one of them was joined to itself, making it look like a fourth. The pw_song table has several columns, but only the primary key SongID is related to this issue. The other two tables have the following structures:
Code:
CREATE TABLE pw_usage (
  SongID int(11) NOT NULL default '0',
  EventID int(11) NOT NULL default '0',
  UseDate date NOT NULL default '0000-00-00',
  UseOrder tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (EventID,UseDate,UseOrder)
) TYPE=MyISAM COMMENT='To record when a song was used';

CREATE TABLE pw_songkey (
  SongID int(11) NOT NULL default '0',
  KeywordID int(11) NOT NULL default '0',
  PRIMARY KEY  (SongID,KeywordID)
) TYPE=MyISAM COMMENT='Many-many link table';
For my test I chose a criteria on pw_song that produces records with the following seven SongIDs: 60,122,137,148,269,379,380. The records in pw_usage that relate to those songs are:
Code:
SongID EventID UseDate   UseOrder
60 	1	2005-07-16	3
60 	1	2005-08-13	4
60 	1	2005-09-10	4
60 	2	2005-08-07	8
122	2	2005-08-14	2
269	1	2005-07-16	4
269	1	2005-08-13	3
269	1	2005-09-10	3
379	1	2005-08-13	5
379	2	2005-08-28	4
380	1	2005-08-13	6
And the records in pw_songkey that relate to these songs are:
Code:
SongID KeywordID
60 	11
60 	18
60 	19
60 	24
60 	29
60 	37
60 	38
122	13
122	25
122	37
137	13
137	37
148	13
148	28
148	37
269	16
269	18
269	24
269	29
269	37
269	38
379	24
379	31
379	38
380	16
380	24
380	31
380	38
For the four queries I listed in my first post, I get 7 records using the first and second queries, and 5 for the third (songs 379 and 380 don't have keyword 37), but when I use the fourth query with EventID=1, the two records with multiple usage records for the event (songs 60 & 269) disappear. When I look at the results for the query with all the joins but not the groupings, I see why: the multiple joins produce a result row for every usage and keyword combination, so "SUM(CASE WHEN pw_songkey.KeywordID IN (37) then 1 else 0 end)" equals 3, not 1, disqualifying those two songs. But I'm at a loss as to what SQL code would make it work right.

I know that with the simple example of one KeywordID in the HAVING clause, my code looks like massive overkill, but the same filter capability could be used to include and exclude any number of keywords - a typical one might be "HAVING SUM(CASE WHEN pw_songkey.KeywordID IN (37) then 1 else 0 end) = 1 AND SUM(CASE WHEN pw_songkey.KeywordID IN (9,16) then 1 else 0 end) = 0", which allows only songs that have keyword 37 and don't have either 9 or 16 (in this case, that means "songs that are in the official list to use for Sunday worship, but not Christmas songs and not songs in Japanese"), which for the data listed above would produce 4 songs. Read thread #1107522 for more info if necessary - that thread used examples from a different database (people and categories instead of songs and keywords), but the relational structure is the same.

So if I were to chart out the data the way r937 described in his last post, it would look like this:
Code:
SongID KeyID EventID UseDate
60 	11	1	2005-07-16,2005-08-13,2005-09-10
   	  	2	2005-08-07
60 	18	1	2005-07-16,2005-08-13,2005-09-10
   	  	2	2005-08-07
60 	19	1	2005-07-16,2005-08-13,2005-09-10
   	  	2	2005-08-07
60 	24	1	2005-07-16,2005-08-13,2005-09-10
   	  	2	2005-08-07
60 	29	1	2005-07-16,2005-08-13,2005-09-10
   	  	2	2005-08-07
60 	37	1	2005-07-16,2005-08-13,2005-09-10
   	  	2	2005-08-07
60 	38	1	2005-07-16,2005-08-13,2005-09-10
   	  	2	2005-08-07
122	13	2	2005-08-14
122	25	2	2005-08-14
122	37	2	2005-08-14
137	13
137	37
148	13
148	28
148	37
269	16	1	2005-07-16,2005-08-13,2005-09-10
269	18	1	2005-07-16,2005-08-13,2005-09-10
269	24	1	2005-07-16,2005-08-13,2005-09-10
269	29	1	2005-07-16,2005-08-13,2005-09-10
269	37	1	2005-07-16,2005-08-13,2005-09-10
269	38	1	2005-07-16,2005-08-13,2005-09-10
379	24	1	2005-08-13
   	  	2	2005-08-28
379	31	1	2005-08-13
   	  	2	2005-08-28
379	38	1	2005-08-13
   	  	2	2005-08-28
380	16	1	2005-08-13
380	24	1	2005-08-13
380	31	1	2005-08-13
I'm not sure how I am to use such a chart, but there it is. Any thoughts?
 
congrats on doing this extensive debugging

i'm confident you learned a fair bit about your queries, yes?

"SUM(CASE WHEN pw_songkey.KeywordID IN (37) then 1 else 0 end)" equals 3, not 1, disqualifying those two songs. But I'm at a loss as to what SQL code would make it work right.
could you please recap what you were trying to achieve with this?

r937.com | rudy.ca
 
Yes, I did learn a fair bit, especially how the result set (before any grouping) gets bigger and bigger as you add joins.

Recap: I want to see a record for every song that passes my "filter" criteria based on keywords that a song must have and/or ones that it mustn't have (that's currently accomplished with the code with the one or two SUM-CASE statements). There may also be other criteria in the pw_song table, but that can be in an ordinary WHERE clause (I'm not having trouble with that). Then, when the user requests it, one of the columns returned should be the last date that a song was used for a particular event, or null if it has never been used for that event (that's currently accomplished by the two joins to the pw_usage table and HAVING one of them = MAX of the other).

What seems to break is that the SUM-CASE stuff is dependent on there not being multiple pre-grouping results for a given song and keyword combination, but when I add in the join to the pw_usage table, I will get multiples. I really liked how you used the double join trick (usage1 and usage2) to isolate things that were conflicting with each other in the quest for the last date - now I'm wondering if the same idea might help isolate the keyword stuff from the rest so that the number of matching keywords can be properly counted. But although I can kinda see how the concept works, I can't quite get my brain wrapped around it well enough to apply it to the keyword part of the query.
 
so by joining each song to its (possibly multiple) keywords, you want to filter out songs which have -- to make a realistic example -- both keywords 9 and 37

as you have no doubt seen, this can only be done in a HAVING clause, since the presence of two rows (two keywords) can only be detected at the GROUP BY level, not at the individual keyword level

so the HAVING part is correct :)

next step is simply to find an expression that tells you that the two keywords were both found

the trick here is to make the expression "multiple-proof" which means that if a song is related to two keywords, and then you join to something else in the same query, you will see both the 9 and the 37 several times

so how about counting how many distinct keywords you found?

change your join to this --
Code:
  FROM pw_song
LEFT OUTER 
  JOIN pw_songkey 
   ON pw_song.SongID
     =pw_songkey.SongID 
  [b]AND pw_songkey.KeywordID IN (9,37)[/b]
(aside: if you want only songs with these two keywords, this should really be an INNER join, not LEFT OUTER)

then you can do this --
Code:
 HAVING count(distinct pw_songkey.KeywordID) = 2
remember, by joining each song to both its keywords and usages, you are getting cross-join effects (your discovery that it returned 3, not 1)

the cross-join effects are collapsed if you use DISTINCT to count

note that the combined query (both keywords and usages) can potentially return many many rows, which the GROUP BY needs to collapse

depending on the data involved, it is sometimes more efficient to split the combined query into two -- one to vet the songs by keyword, which will return a list of song ids, and then a second query for latest usages, using the list of song ids as a driver

in 4.1 you can of course combine them way more easily and understandably, using subqueries

please do not get the impression that complex sql queries are difficult -- the only difficulty you seem to have encountered throughout this entire situation is that without subqueries, you are severly limited in what you can do, and the workarounds are not always elegant

r937.com | rudy.ca
 
Thanks! There is one more aspect to the keyword filter feature that originally made that difficult: there can potentially be keywords specified to exclude - for example, I might want to see only records that have keywords 9 and 37 (like you said in your example), but don't have keyword 16 or 24. However, your new way of doing it (putting the list of keywords in the JOIN ON section rather than a CASE in the HAVING section) simplified it enough that I think I figured out how to add the double join idea to it. I haven't tried putting it into my PHP code yet in order to try lots of cases, but for my single test case (song data from my previous post, and keyword 37 but not 9 or 16) this appears to work, giving me the four records I expect:
Code:
SELECT pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, LEFT(Lyrics,INSTR(Lyrics,'\n')-1) AS FirstLine, usage1.UseDate AS Last
FROM pw_song
INNER JOIN pw_songkey AS inkeys
   ON pw_song.SongID=inkeys.SongID
   AND inkeys.KeywordID IN (37)
LEFT OUTER JOIN pw_songkey AS exkeys
   ON pw_song.SongID=exkeys.SongID
   AND exkeys.KeywordID IN (9,16)
LEFT OUTER JOIN pw_usage AS usage1
   ON pw_song.SongID=usage1.SongID
   AND usage1.EventID=1
LEFT OUTER JOIN pw_usage as usage2
   ON pw_song.SongID=usage2.SongID
   AND usage2.EventID=1
WHERE Title LIKE '%praise%' OR OrigTitle LIKE '%praise%' GROUP BY pw_song.SongID, Title, Tagged, OrigTitle, Tempo, SongKey, FirstLine, Last
HAVING count(distinct inkeys.KeywordID) = 1
   AND count(distinct exkeys.KeywordID) = 0
   AND (Last=MAX(usage2.UseDate) OR Last IS NULL)
ORDER BY OrigTitle
I'll give you your star now, and tomorrow I'll try incorporating this in my interactive code that builds a query based on lots of different scenarios. If you don't hear from me about it again, then it worked!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top