Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Join (select... vs. where ... in ( 1

Status
Not open for further replies.

BobRodes

Instructor
May 28, 2003
4,215
US
I have two statements:
Code:
select * from m2010 where icd9code in (
select distinct icd9code from m2010 group by icd9code having count(icd9code) > 1)
and
Code:
select * from m2010 m
join (select distinct icd9code from m2010 group by icd9code having count(icd9code) > 1) d on
m.icd9code = d.icd9code
These both return the same data. Is either of these more to be recommended than the other, or are there circumstances to use one and other circumstances to use the other? Or are they pretty much the same in terms of performance?

TIA
 
Personally, I don't like to use an IN clause unless I am hard coding a very short list. Ex:

Select * From People Where ShoeSize In (8.5, 9 ,9.5)

The problem with IN, is that it can return unexpected results. For a good example on this, please read:
The wiki article demonstrates the problem with IN, and shows some alternatives that you should be using instead.

are they pretty much the same in terms of performance?

They probably are, but it may depend on your table and index structures. The best way to determine (from a performance perspective) which is better, is to run both of them and check.

Here's how:

Open SQL Server Management Studio and connect to the correct DB.
Open a new query window.
Copy/paste both queries in to the query window.
Press CTRL-M on your keyboard (you won't notice anything happening).
Run the query.

CTRL-M causes SQL Server to display the actual execution plan. There will be a new tab for it. Each query will have it's own section and will display a percentage for the execution time.

If the percentages are each 50, and the execution plans are the same, then there is no difference in performance.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Looking at the 2 queries, I think you're right in that they return the same results. More specifically, I think they MUST return the same results. If you didn't have the distincts or the having clause, the 2 queries may not return the same results.

BTW. You can remove the DISTINCT keywords from both queries because they are not necessary. Since you are only returning one column, and that column is in the group by, you are guaranteed distinct results anyway.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks George. Great tips. After reading your link, I also tried this:
Code:
select * from m2010 m where exists (select icd9code from m2010 m2 where m2.icd9code = m.icd9code group by icd9code having count(icd9code) > 1)
Interestingly, the execution plans were not only all at 33%, they were identical.

An unintended consequence of the information revolution has been the exponential propagation of human error.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top