Join (select... vs. where ... in ( 1

BobRodes · Mar 9, 2011

I have two statements:

Code:

select * from m2010 where icd9code in (
select distinct icd9code from m2010 group by icd9code having count(icd9code) > 1)

and

Code:

select * from m2010 m
join (select distinct icd9code from m2010 group by icd9code having count(icd9code) > 1) d on
m.icd9code = d.icd9code

These both return the same data. Is either of these more to be recommended than the other, or are there circumstances to use one and other circumstances to use the other? Or are they pretty much the same in terms of performance?

TIA

gmmastros · Mar 9, 2011

Personally, I don't like to use an IN clause unless I am hard coding a very short list. Ex:

Select * From People Where ShoeSize In (8.5, 9 ,9.5)

The problem with IN, is that it can return unexpected results. For a good example on this, please read:

http://wiki.lessthandot.com/index.php/Subquery_typo_with_using_in

The wiki article demonstrates the problem with IN, and shows some alternatives that you should be using instead.

are they pretty much the same in terms of performance?

They probably are, but it may depend on your table and index structures. The best way to determine (from a performance perspective) which is better, is to run both of them and check.

Here's how:

Open SQL Server Management Studio and connect to the correct DB.
Open a new query window.
Copy/paste both queries in to the query window.
Press CTRL-M on your keyboard (you won't notice anything happening).
Run the query.

CTRL-M causes SQL Server to display the actual execution plan. There will be a new tab for it. Each query will have it's own section and will display a percentage for the execution time.

If the percentages are each 50, and the execution plans are the same, then there is no difference in performance.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

gmmastros · Mar 9, 2011

Looking at the 2 queries, I think you're right in that they return the same results. More specifically, I think they MUST return the same results. If you didn't have the distincts or the having clause, the 2 queries may not return the same results.

BTW. You can remove the DISTINCT keywords from both queries because they are not necessary. Since you are only returning one column, and that column is in the group by, you are guaranteed distinct results anyway.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

BobRodes · Mar 9, 2011

Thanks George. Great tips. After reading your link, I also tried this:

Code:

select * from m2010 m where exists (select icd9code from m2010 m2 where m2.icd9code = m.icd9code group by icd9code having count(icd9code) > 1)

Interestingly, the execution plans were not only all at 33%, they were identical.

An unintended consequence of the information revolution has been the exponential propagation of human error.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Join (select... vs. where ... in ( 1

BobRodes

Instructor

gmmastros

Programmer

gmmastros

Programmer

BobRodes

Instructor

Similar threads

Part and Inventory Search

Sponsor