LEFT JOIN vs MAX(...)

Qik3Coder · Jun 15, 2011

Back story:
I have a NASTY query I'm working with.
It's a "straight" select and it's over 800 lines long.

Problem:
There are approx 50 left joins to a kev/value table by GUID

Code:

SELECT col47.Value MySpecialColumn
...
left join dbo.myLookupTable col47 (NOLOCK) on
  col47.[ID] = myBaseTable.ID   
  and col47.myGUID = 'AAAAAAAAAAAAAAAAAAAAAA'
...

I had tried turning this on it's side with:

Code:

SELECT
...
 MAX(CASE WHEN cols.ID ='AAAAAAAAAAAAAAAAAAAAA' 
   THEN cols.Value ELSE '' END) MySpecialColumn
...
LEFT JOIN dbo.myLookupTable cols (NOLOCK) on
  cols.[ID] = myBaseTable.ID
...
GROUP BY myBaseTable.ID

I'm not seeing much improvement, and as expected, my memory usage went through the roof.

I'm going to pull the code bits out into a temp table, and then join on them, but am looking for any suggestions.

Lodlaiden

A lack of experience doesn't prevent you from doing a good job.

gmmastros · Jun 15, 2011

I assume the NASTY query is joining to other tables too, not just the myBastTable and myLookupTable. Is this true?

You may want to isolate the data from the lookup table in to a CTE or derived table. Also make sure you have good indexes on the lookup table.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Qik3Coder · Jun 15, 2011

There are about 30 tables, besides the Lookup table.

There were several

Select 'A' Col1, 'B' Col2, ...
(select top max(col)
from dbo.SomeTable
where [red]dbo.SomeTable.ID = mbt.ID[/red]
order by DateField Desc) as Col23, 'X' Col 24, ...
FROM dbo.MyBaseTable mbt
...

I turned these into derived tables and join them, which helped a little.

My biggest problem seems to be the data itself. Some of the raw selects out of the tables just take way longer than expected.

A lack of experience doesn't prevent you from doing a good job.

gmmastros · Jun 15, 2011

Actually... your code sample shows a subquery, not a derived table. subqueries are usually a lot slower than derived tables because they are calculated for each row instead of set based.

It seems like you are trying not to give out too much information about your query, but seeing more of the code would certainly help.

Also, can you share with me what the indexes look like on the lookup table:

sp_helpindex 'myLookupTable'

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Qik3Coder · Jun 15, 2011

They were subqueries, both in the projection list and in the joins. I turned them into derived tables and joined those, which helped.

It takes about 5 mins to "cross-tab" the entire lookup table.
I wind up with about 100k rows (70 columns)

"Should" I be better off putting the cross tab in a temp (either physical or memory) or putting it in as a derived table?

I'm about 1/2 done rewritting the query to use a physical temp table (so i didn't have to write the projection list into the table)

Except for the LookUp table it's joins on straight primary keys, which are clustered. I'm faily certain the bulk of my issue is with the LookUp and a similar Status table.

The worse part about this is each of the components works well, but putting it all together and it just drags.

A lack of experience doesn't prevent you from doing a good job.

gmmastros · Jun 15, 2011

Temp table, table variable, derived table, OH MY.

I've been messing with SQL a ling time now and I still get it wrong sometimes. Usually derived tables perform better than temp tables and table variables. Table variables often times perform better than temp tables. There are exceptions to all the rules though. The only way to know for sure it to try each way. Trying it all is time consuming, so I usually only do that for queries I really need to improve the performance on.

Just a thought... have you updated statistics lately?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Qik3Coder · Jun 15, 2011

Yeah. (statistics and indexes are pretty clean)

The baseline query (i was told) runs for 1 hour in production.
It ran for 2 hrs on the dev machine.

I'm retooling the statuses part now, so i can get rid of the group by all together.

I'm starting with temp tables, and will try table variables if I don't get some massive improvement.

Thanks gmmastros

A lack of experience doesn't prevent you from doing a good job.

markros · Jun 15, 2011

I suggest to put intermediate result (before cross-tabbing it) into a temp table. I did recently optimized several complex reports and I found that materializing intermediate result into a temp table helps a lot.

In addition, sometimes there is no way to make UDF perform as good as stored procedure exactly because of these considerations.

There are several threads on MSDN forum in regards to this problem. Try searching on 'CTE temp tables MSDN forum'

PluralSight Learning Library

Qik3Coder · Jun 15, 2011

Are you suggesting filtering, possibly in a derived table, prior to the cross-tab, to speed it up? It's about 5 mins for the cross-tab, which I find ridiculous.

I'm down to about 30mins, from 2hrs, so I'm feeling better, but I have more things I can clean up.

Suggestions on turning a Clustered Index Scan to a Clustered Index Seek when it's a straight x.ID = y.ID?

BTW, thanks for the tips today.

A lack of experience doesn't prevent you from doing a good job.

markros · Jun 15, 2011

Yes, before doing a cross-tab (PIVOT), try putting the results into temp table.

BTW, how exactly you're doing PIVOT?

Also, if you can split complex select into parts, it may help.
And, of course, don't use subqueries as projections.

PluralSight Learning Library

markros · Jun 17, 2011

BTW, this is the thread I was talking about (one of them):

http://social.msdn.microsoft.com/Fo...l/thread/2b977c99-f026-43f5-b062-9d976fd0fed6

PluralSight Learning Library

Qik3Coder · Jun 17, 2011

Interestingly enough, putting one of the smaller cross tabs into a table variable vs a temp table degraded performance.

I have a couple more optimizations to make yet, but I think I'm just about there.

Thanks for the help along the way guys.

A lack of experience doesn't prevent you from doing a good job.

gmmastros · Jun 17, 2011

Smaller usually is better in table variables. Did you try putting an index on the table variable? You cannot index a table variable in the same way you index real tables or even temp tables. With a table variable, you can create a primary key which is implemented with an index, like this:

Code:

Declare @Temp Table(RowId Int Identity(1,1), Data VarChar(20), Primary Key(Data, RowId))

Notice that I introduced an identity column. This is important because you need a primary key which must be unique. My including an identity column and including that in the primary key, you are guaranteed to get a unique primary key.

Like everything else, this does not guarantee improved performance, but I have used this trick quite a few times in my code to improve performance.

It's worth a shot.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

LEFT JOIN vs MAX(...)

Qik3Coder

Programmer

gmmastros

Programmer

Qik3Coder

Programmer

gmmastros

Programmer

Qik3Coder

Programmer

gmmastros

Programmer

Qik3Coder

Programmer

markros

Programmer

Qik3Coder

Programmer

markros

Programmer

markros

Programmer

Qik3Coder

Programmer

gmmastros

Programmer

Similar threads

Part and Inventory Search

Sponsor