most efficient query for finding if variable NOT in either table?

shadedecho · Apr 8, 2003

Code:

mytable1:             mytable2:
+-----+-----+         +-----+-----+
|  A  |  B  |         |  A  |  B  |
+-----+-----+         +-----+-----+
|  1  |  2  |         |  5  |  6  |
|  3  |  4  |         |  6  |  7  | 
|  5  |  6  |         |  3  |  3  |
|  7  |  8  |         +-----+-----+
+-----+-----+

I need to constuct a SQL query that basically tells me if a test value i am putting into the query is (or is not) in the `B` column of either table.

My first thought is to do a cross join on the two tables, and a WHERE clause which checks mytable1.B or mytable2.B... in fact, this works.

SELECT myt1.`B`,myt2.`B` FROM `mytable1` myt1, `mytable2` myt2 WHERE myt1.`B` = '$$test$$' OR myt2.`B` = '$$test$$'

the cross join (before the WHERE clause) would obviously look like:

Code:

+-----+-----+
|  2  |  6  |
|  4  |  6  |
|  6  |  6  |
|  8  |  6  |
|  2  |  7  |
|  4  |  7  |
|  6  |  7  |
|  8  |  7  |
|  2  |  3  |
|  4  |  3  |
|  6  |  3  |
|  8  |  3  |
+-----+-----+

But, I am concerned that as the length of my tables grows, the cross join memory consumption will grow exponentially.

If i understand the process of JOINS correctly, it creates that temporary JOIN'd table, THEN applys the WHERE clause to each row to see if its included in the result set. This means that as the data grows, the temp table grows BIG as well.

However, I think the ON clause is applied row by row in the creation of the JOIN'd temp table. {please someone correct me if i am wrong, here}

Which leads to my first idea: If I moved the conditions that are in the WHERE clause to an ON condition in the INNER (cross) JOIN, like

Code:

SELECT myt1.`B`,
       myt2.`B` 
FROM `mytable1` myt1 
     INNER JOIN `mytable2` myt2 
       ON (myt1.`B` = '$$test$$' OR myt2.`B` = '$$test$$')

would that limit the JOIN table that is created to only rows that satisfied the ON condition? or is ON applied AFTER the creation of the JOIN'd temp table, as is WHERE? if so, anyone with ideas on how to construct a query who's memory consumption/complexity doesn't grow so rapidly as the data set(s) grow?

sleipnir214 · Apr 8, 2003

What is it with you and atomic operations?

In either case, MySQL is going to have to produce (length of table1) * (length of table two) combinations. Also, you can't insert into both tables at the same time anyway.

If you perform the check on your example tables using a cross join, MySQL has to perform 12 tests against column values. If you do it as two separate queries, MySQL only needs to test 7 values.

And unless I am mistaken, with the separate queries, MySQL can make use of table indeces to make the query more efficient. With the cross join it cannot. Want the best answers? Ask the best questions:

http://www.catb.org/~esr/faqs/smart-questions.html

TANSTAAFL!

swampBoogie · Apr 8, 2003

Using a join for this query can be devastating. You will not get the result you are expecting as the tables are not related.

What is needed is a union query

select B from mytable1
union all
select B from nytable2
limit 1

shadedecho · Apr 8, 2003

I am having to cycle through both tables to find a value that is unique across both, as swampboogie suggested, the problem is that i am on v3.23.x, so I don't the UNION available to me

maybe I should just select distinct twice, once from each table, into a temp table, then do a select "search" on the temp table? is that more efficient?

sleipnir, the reason i think two seperate queries is a problem relates to the nature of how I am scripting this... I need to cycle through in a while loop, generating values using rand(), til I generate a value that does not yet exist in EITHER table, not just 'til the first time i find one that doesn't exist in one or the other. i thought this could best be done using a single (atomic) SQL result query, rather than some complicated PHP scripting comparing two result sets simultaneously.

sleipnir214 · Apr 8, 2003

You don't have to cycle through two result sets.

Something like:

<connect to database>

$found1 = 0;
$found2 = 0;

while ($found1 != 0 && $found2 != 0)
{
$number = <your random function>;

$query1 = "SELECT sum(1) FROM mytable1 WHERE B = " . $number;
$query2 = "SELECT sum(1) FROM mytable2 WHERE B = " . $number;

$rh = mysql_query ($query1);
list ($found1) = mysql_fetch_row($rh);

$rh = mysql_query ($query2);
list ($found2) = mysql_fetch_row($rh);
}

print $number;

Want the best answers? Ask the best questions:

http://www.catb.org/~esr/faqs/smart-questions.html

TANSTAAFL!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

most efficient query for finding if variable NOT in either table?

shadedecho

Programmer

sleipnir214

Programmer

swampBoogie

Programmer

shadedecho

Programmer

sleipnir214

Programmer

Similar threads

Part and Inventory Search

Sponsor