Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

NOT EXISTS vs NOT IN

Status
Not open for further replies.

stoggers

Vendor
May 30, 2001
93
US
Hi,

Could someone please explain the difference between two semantically equivalent SQL statements.

I had a delete using NOT IN that ran for 7 hours and when I ran it using NOT EXISTS it run in 20 minutes! I explained the two statements and the NOT IN used an index and a table scan while the NOT EXISTS used two indexes.

Regards

Mike.
 
Hello Stoggers,

Semantically equivalent? Not in your lifetime! The exists predicate calls upon a subquery, if this subquery is optimalized, it can be very fast. If you use a where clause with 'IN' to do a check upon a non-indexed field this will surely give you a least one full range table scan. I'm quite curious what the SQL definitions were, if they did the same job for you . Are you sure they actually have the same effect?? T. Blom
Information analist
Shimano Europe
tbl@shimano-eu.com
 
Hi,

The two statements are:

SELECT ADR_NO
FROM ADDRESS A
WHERE NOT EXISTS (
SELECT ADR_NO
FROM ADDRESS_OCC B
WHERE A.ADR_NO = B.ADR_NO)

and

SELECT ADR_NO
FROM ADDRESS
WHERE ADR_NO NOT IN
(SELECT ADR_NO FROM ADDRESS_OCC)

They both return the same result set...
 
Hello Stoggers,

I suspect that in the "NOT IN" query the outer query simply does not use an index to evaluate the list of values that is offered by the where clause. So for each of the values from the where clause it performs a full table scan (possibly because it looks for the set of values that complement the list from the where clause)
The "NOT EXISTS" correlated query does not in fact return a value, but is either true or false for a given value. The following query should give the same result:

SELECT ADR_NO
FROM ADDRESS A
WHERE NOT EXISTS (
SELECT *
FROM ADDRESS_OCC B
WHERE A.ADR_NO = B.ADR_NO)

So for each value from the outer query it performs this true / false check only using indexes , but for each value only once.

This is a bit of guessing, but in the first case you have to go through a table x times, while in the second case you walk through it once , on the way performing a quick true / false lookup into the set of the subquery.


T. Blom
Information analist
Shimano Europe
tbl@shimano-eu.com
 
Hi Stoggers,
my belief is that for every row selected in the NOT IN query, an interim results table is produced (ie. the subquery is run to produce results) and then searched. In other words, for each row in the table, a duplicate table made up of the ADR_NO is created and searched. Sounds pretty awful to me, and could be rectified by an addition of the WHERE A.ADR_NO = B.ADR_NO as in the EXISTS query (I think).
Hope this helps to clarify things
Marc
 
I noticed that you qualified your tables in your first example (exists),
SELECT ADR_NO
FROM ADDRESS A
WHERE NOT EXISTS (
SELECT *
FROM ADDRESS_OCC B
WHERE A.ADR_NO = B.ADR_NO)

but not the second (in).
SELECT ADR_NO
FROM ADDRESS
WHERE ADR_NO NOT IN
(SELECT ADR_NO FROM ADDRESS_OCC)

I have found that fully qualifing both the select and sub select results in a better response time. I know it sounds odd, but I have seen it many times.

Also, providing you have an idex on adr_no and your statistics are current.....
you might try changing your "in" query to a correlated sub query like this:
select a.adr_no
from address
where a.adr_no not in
(select b.adr_no from address_occ b
where a.adr_no = b.adr_no)

I am not sure what your platform is , but, if at all possible in db2 (at least on DB2 for OS390) avoid using "not" . It tends to result in tablespace scans.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top