Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

MySQL 4.0 substitute for multiple subqueries 1

Status
Not open for further replies.

OsakaWebbie

Programmer
Feb 11, 2003
628
JP
I have a table called "person" with a primary key PersonID. I also have a table "category" with a primary key CategoryID, and a table "percat" (PersonID, CategoryID) to form a many-many relationship between people and categories. I want to be able to add "filters" to only show people who belong to certain categories and/or don't belong to others. There could be any number of filters, set by the user to either require membership in one or more categories, exclude members of one or more categories, or both.

This looks like a job for multiple subqueries (one subquery of percat for each filter), but my hoster is still using MySQL 4.0.18, so I have to come up with another way. The web page only shows very simple examples with either existence or absence in a second table - that doesn't address my situation, because I need to examine multiple percat records before determining whether a person record should be returned. Let's say, for example, that I want to select all person records that "belong" to categories 1 and 3, but don't belong to 2 and 4, how can I make a single query to do it? All the ideas I have come up with only look at one percat record at a time ("SELECT person.* FROM person, percat WHERE person.PersonID=percat.PersonID AND [some logic statement about percat.CategoryID]", or a join that does about the same thing), but I don't know how to meld it together. Can someone help?
 
One approach to substituting a subquery is to use a temporary table. For example, you could say:
[tt]
CREATE TEMPORARY TABLE t
SELECT personid
FROM percat
WHERE ...

SELECT person.*
FROM person JOIN t USING (personid)
WHERE ...
[/tt]
The temporary table is private to the session where it's created, so it won't clash with other tables, and will be automatically dropped at the end of the session.
 
Thanks for the reply, Tony. You say the table is private to the "session" - how long does that last, just the time it takes for the page to load (the PHP file executing, starting with the mysql_connect and ending with the end of the file) or something longer? Also, this application does lots and lots of queries, all of which would have this filter functionality somehow added to it - would recreating big temp tables over and over be slow?

Meanwhile, I saw an answer by r937 to a different thread ( that might relate to my issue. It utilizes a multiple WHERE (resulting in multiple records) and then groups them together "HAVING count(*) >= [the number of conditions]". Clever! But how would I do that if some of the conditions are the absence of a record rather than only presence? I remember seeing something about "WHERE table2.id is null" as part of some complex join that seemed to return an instance only if something in a second table didn't exist, but I don't know where I saw it, nor do I remember enough to apply it...
 
Let's say, for example, that I want to select all person records that "belong" to categories 1 and 3, but don't belong to 2 and 4, how can I make a single query to do it?
temporary tables are not required for this class of problem


the solution is really quite simple -- just count the rows you're interested in, using GROUP BY and CASE expressions

note that since we're going to use GROUP BY, we cannot use SELECT *
Code:
select person.PersonID 
  from person
inner
  join percat 
    on person.PersonID = percat.PersonID 
group
    by person.PersonID 
having sum(case 
        when percat.CategoryID in (1,3)
        then 1 else 0 end) = 2    
   and sum(case 
        when percat.CategoryID in (2,4)
        then 1 else 0 end) < 2
why less than 2 for categories 2,4?

because you said "don't belong to 2 and 4"

if you had said "don't belong to 2 or 4" then the last sum would have to be equal to 0

:)

r937.com | rudy.ca
 
You say the table is private to the "session" - how long does that last ... ?
Until the current connection is ended - the end of the PHP script. You can also drop the table during the script if you've no more use for it.

[qoute]would recreating big temp tables over and over be slow?[/quote]
The temporary tables would be stored in memory, so I don't think creating and populating a temporary table would be any slower than processing the equivalent subquery.
 
thank you :) :)


are you okay with the answer i gave in this thread?

did you understand the part about the difference between "don't belong to 2 and 4" and "don't belong to 2 or 4"?

r937.com | rudy.ca
 
Yes, I'm okay with your answer and understand what you meant. The reason I didn't respond immediately is that I was asleep while you guys were conversing about this, as I live in Japan.

I have now tried it, and gotten a few test cases to work correctly with two small changes to your solution:
1) I changed the inner join to a left join, because otherwise, even if the only filters being used are the "not member" type, a person in no categories at all will not show up.
2) In the second sum-case (for exclusions), the sum needs to be =0 rather than <2, in order to exclude all undesired categories rather than just one of them.

I haven't tested all possible combinations, but it looks like I can make this technique work. If you see any logic gaps in the changes I made, let me know.

Next I have the challenge of adding this to my PHP code, because it will be a function that will add the filter constrainsts to any select statement that includes person data, and that could be a variety of things - cases that already involve percat in some way and cases where the join will need to be added. By the way, is there any functional difference between an inner join and a basic "FROM t1,t2 WHERE t1.id=t2.id" type of thing? If not, then if percat already appears in the statement either as a join or in the table list, I should just be able to add the "GROUP BY..." section at the end of the whole thing. (Fortunately I don't yet use GROUP BY anywhere in the queries I have to add this to.)

I'll start in on this code and let you know how it goes, but if you have any additional insights/cautions, I'm all ears.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top