Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

QUERY SPEED: IN vs. NOT IN 2

Status
Not open for further replies.

Spork52

Programmer
Nov 20, 2007
134
US
I've got to update records in a table containing about 210,000 records and I'm curious if different versions of the query will make much difference in the time it takes to execute.

Specifically, is it better to use IN with a subquery that returns a large recordset or NOT IN with a subquery that returns a small recordset? Or will it not make much difference?

Version 1:
Code:
UPDATE table1
SET Field1 = NULL, Field2 = NULL, Field3 = NULL, Field4 = NULL, Field5 = null
WHERE table1_id IN
(
SELECT table1.table1_id FROM table
LEFT JOIN table2 ON table1.table1_id = table2.table1_id
LEFT JOIN table3 ON table2.table3_id = table3.table3_id
WHERE
ISNULL(table3.table4_id, 0) <> '1436' 
AND ISNULL(table3.table4_id,0) <> '2477'
AND ISNULL(table3.table4_id,0) <> '6428' 
AND ISNULL(table3.table4_id, 0) <> '7241'
)

The subquery in the above returns about 204,000 records, so I'm thinking the following might be faster:

Version 2
Code:
UPDATE table1
SET Field1 = NULL, Field2 = NULL, Field3 = NULL, Field4 = NULL, Field5 = null
WHERE table1_id NOT IN
(
SELECT table1.table1_id FROM table
LEFT JOIN table2 ON table1.table1_id = table2.table1_id
LEFT JOIN table3 ON table2.table3_id = table3.table3_id
WHERE
(ISNULL(table3.table4_id, 0) = '1436' 
OR ISNULL(table3.table4_id,0) = '2477'
OR ISNULL(table3.table4_id,0) = '6428' 
OR ISNULL(table3.table4_id, 0) = '7241')
)

Now the subquery returns about 6,000 records.

Will putting "ORDER BY table3.table4_id" in the subquery make any difference?
 
Also note, fewer records does not equate to fast speed, especially when the server can utilize indexes.

Search the net for SQL performance tuning etc. and there are several examples where they create queries and show examples where, depending on the number of records, different methods run faster.

"Sargable" queries as a rule, run faster then non-sargable queries.
 
I think this would run faster, having an ordinary inner join :

UPDATE TBL
SET Field1 = NULL, Field2 = NULL, Field3 = NULL, Field4 = NULL, Field5 = null
from table1 TBL
inner join
(
SELECT table1.table1_id FROM table
LEFT JOIN table2 ON table1.table1_id = table2.table1_id
LEFT JOIN table3 ON table2.table3_id = table3.table3_id
WHERE
ISNULL(table3.table4_id, 0) <> '1436'
AND ISNULL(table3.table4_id,0) <> '2477'
AND ISNULL(table3.table4_id,0) <> '6428'
AND ISNULL(table3.table4_id, 0) <> '7241'
) QRY
on TBL.table1_id = QRY.table1_id


However, are you sure you want to compare numbers with strings (I believe ISNULL(table3.table4_id, 0) returns an integer which you compare to the string '1436')?

Moreover...are you sure you need the 'Left join' on table3? If yes, your 'where' clause should be moved to the join definition. Otherwise, it actually turns your Left Join into an Inner Join.


[pipe]
Daniel Vlas
Systems Consultant

 
Thanks, I didn't know aliases could be used in an UPDATE like that. And thanks for the "sargable" article. Great word.

Yes, it has to be LEFT JOIN on table3. I don't understand what you mean by the WHERE clause turning the LEFT JOIN on table3 into an INNER JOIN.

Also, don't see why it should be moved to the JOIN condition (which JOIN do you mean?) If the LEFT JOIN on table3 doesn't occur for a particular record, then table4_id will be null and the record will be included in the update, which is what I want.

Anyway, here's a revised query that attempts to make everything more sargable. Gets rid of the ISNULLs, but not the <>. I don't think the ORs will slow things up much, because there are very few cases in which the first part of the clause is false or unknown. I assume SQL evaluates the first part of an OR clause and then stops if it's true.

Code:
UPDATE TBL 
SET Field1 = NULL, Field2 = NULL, Field3 = NULL, Field4 = NULL, Field5 = null
from table1 TBL
INNER JOIN
(
SELECT table1.table1_id FROM table
LEFT JOIN table2 ON table1.table1_id = table2.table1_id
LEFT JOIN table3 ON table2.table3_id = table3.table3_id
WHERE (table1.created <= DATEADD(day, -60, getdate()) OR table1.created IS NULL)
AND (table3.table4_id <> 1436 OR table3.table4_id IS NULL)
AND (table3.table4_id <> 2477 OR table3.table4_id IS NULL)
AND (table3.table4_id <> 6428 OR table3.table4_id IS NULL)
AND (table3.table4_id <> 7241 OR table3.table4_id IS NULL)
) QRY
on TBL.table1_id = QRY.table1_id
 
I guess this would be better:
Code:
AND
(
(table3.table4_id <> 1436 AND table3.table4_id <> 2477 AND table3.table4_id <> 6428 AND table3.table4_id <> 7241)
OR 
table3.table4_id IS NULL
)
 
Well, it took all of 70 seconds to update 212,000 records.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top