join question 1

wbodger · Feb 28, 2012

I was working on a little query

Code:

SELECT C.STUDY_PARTICIPANT_ID,
C.STUDY_PROTOCOL_ID,
C.STUDY_SITE_ID,
PHS.PROC_LABS_PSA_DATE,
PHS.PROC_LABS_PSA_VALUE,
PHS.PROC_LABS_PSATESTNAME_CODE,
PHS.PROC_LABS_PSATESTNAMEOTHER_TEXT,
getdate() AS AccessDate
FROM dbo.TBLCONSENT C JOIN TBLPSAHISTORYSUB PHS 
	ON C.Study_Participant_ID = PHS.Study_Participant_ID
	JOIN tblFinalG FG ON C.Study_Participant_ID = FG.Study_Participant_ID
WHERE C.DeleteFlag=0
AND C.EntryFlag=1
AND PHS.DeleteFlag=0

which works just fine, but there were other issues, so as we were looking over the entire PR, he suggested changing the JOINS to look more like

Code:

SELECT C.STUDY_PARTICIPANT_ID,
C.STUDY_PROTOCOL_ID,
C.STUDY_SITE_ID,
PHS.PROC_LABS_PSA_DATE,
PHS.PROC_LABS_PSA_VALUE,
PHS.PROC_LABS_PSATESTNAME_CODE,
PHS.PROC_LABS_PSATESTNAMEOTHER_TEXT,
getdate() AS AccessDate
FROM (SELECT * FROM tblFinalG WHERE deleteflag=0) AS FG
	JOIN (SELECT * FROM  dbo.TBLCONSENT WHERE deleteflag=0 and entryflag=1) AS C ON FG.Study_Participant_ID = C.Study_Participant_ID
	JOIN (SELECT * FROM TBLPSAHISTORYSUB WHERE deleteflag=0) AS PHS ON FG.Study_Participant_ID = PHS.Study_Participant_ID
	JOIN (SELECT * FROM tblPSAHistory WHERE deleteflag=0 and entryflag=1) AS PH ON FG.Study_Participant_ID = PH.Study_Participant_ID

Now, my question is related to performance. In the past I would avoid things like this because of performance issues. However, I am at a new job where we are pulling data out of SQL Server to give to statisticians for analysis. Our database is underutilized and none of the tables are really large (maybe 150k records at the most), so performance isn't likely to be an issue, but are there potential performance issues with structuring the joins like that?

wb

djj55 · Feb 28, 2012

You might try something like:

Code:

SELECT C.STUDY_PARTICIPANT_ID,
    C.STUDY_PROTOCOL_ID,
    C.STUDY_SITE_ID,
    PHS.PROC_LABS_PSA_DATE,
    PHS.PROC_LABS_PSA_VALUE,
    PHS.PROC_LABS_PSATESTNAME_CODE,
    PHS.PROC_LABS_PSATESTNAMEOTHER_TEXT,
    getdate() AS AccessDate
FROM (SELECT * FROM tblFinalG WHERE deleteflag=0) AS FG
INNER JOIN dbo.TBLCONSENT AS C ON FG.Study_Participant_ID = C.Study_Participant_ID
                                AND c.deleteflag=0 and c.entryflag=1
INNER JOIN TBLPSAHISTORYSUB AS PHS ON FG.Study_Participant_ID = PHS.Study_Participant_ID
                                AND PHS.deleteflag=0
INNER JOIN tblPSAHistory AS PH ON FG.Study_Participant_ID = PH.Study_Participant_ID 
                                AND PH.deleteflag=0 and PH.entryflag=1

Just a starting point

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!

wbodger · Feb 28, 2012

Yeah, I can play around with the join types, but I was wondering more about the performance differences between joining two tables on a specific column and then specifying in a where clause which records to choose

Code:

from a join b on a.id=b.id
where a.setting='this'

vs selecting everything from the tables in the from with the constraints in the join clause.

Code:

from (select * from a where setting='this') as a join (select * from b where b.setting='that') as b on a.id=b.id

wb

gmmastros · Feb 28, 2012

Most likely, it will not make any difference in performance because both queries are essentially the same.

When you run a query in SQL Server, there are several steps that occur behind the scenes. One of the steps is to determine the optimal method for returning the data. SQL Server is pretty good at returning data in the fastest way possible. Trying to "trick" the optimizer by moving things around does not usually work.

Here's how you can tell for sure.

1. Load both queries in to the same query window within SQL Server Management Studio.
2. Hold the CTRL key and tap the M key. You will not notice anything happening.
3. run the query by pressing F5
4. You will see a new tab next to results called execution plan. This table will show you a lot of information regarding how the queries are executed behind the scenes.

In the execution plan tab, you will see the execution plan for both queries. At the top of each query, you will see a percentage. The percentages should add up to 100. With both of your queries loaded, I would expect each one to show 50%. If one query is more efficient than another, the more efficient one will have a lower percentage.

You will also see the steps involved with producing your results. You see things like Index scans, Index Seeks, Bookmark lookups, sorts, etc.... for each step, you will see a percentage. The percentages for each step in a query will add up to 100. This allows you to see which step in the process is taking the most time (because it will have a higher percentage).

I encourage you to take a look at the execution plans and compare them. If my guess is right, both queries will take the same amount of time, have the same percentages, and show identical execution plans.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

wbodger · Feb 28, 2012

Very interesting, and yes, the execution plans were identical. Thank you for that tutorial, it was one of those things that I really didn't know how to do. I had thought that since it was doing a select * that it would take longer.

wb

gmmastros · Feb 28, 2012

Right, but the asterisks were in derived tables. SQL is smart enough to realize that the final output does not depend on all of the columns from the derived table.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

join question 1

wbodger

Programmer

djj55

Programmer

wbodger

Programmer

gmmastros

Programmer

wbodger

Programmer

gmmastros

Programmer

Similar threads

Part and Inventory Search

Sponsor