Left Outer Join tune up...

m0nkey · Sep 8, 2005

Can someone please look over this code and tell me if there is anything that they can recommend to speed it up...I have all the qualified fields indexed or clustered and the thing is, it runs fast (<1 sec) if i run it without the last line: "AND A.zipcode is null", otherwise it runs for 3.5 minutes...

I am not sure why this is slowing this query down so much. The source table [20054_dup] is large:90 million entries but only 6 columns. acct_no is clustered and zipcode is indexed. the avoid_areas table is only 10K records and has 2 columns. The zipcode is clustered. i have tried this with a hint and without. the execution plan shows the additional hash match/right ourter joins, parallelism and clustered index scans when i add this final paft of the left outer join.

Am i missing something with this query? Please let me know if you see something awry or can sugesst a tune somewhere...

Select distinct top 1000 D.acct_no, D.zipcode
from [20054_dup] D With (tablockx, holdlock)
left outer join avoid_areas A WITH(INDEX(IX_avoid_areas)) on A.zipcode = D.zipcode
left outer join repo_35 R on R.acct_no= D.acct_no
WHERE D.timezone = '6'
AND R.acct_no is null
AND D.problem = '0'
AND A.zipcode is null

gmmastros · Sep 8, 2005

Perhaps I don't understand what the select is trying to do, but... is it possible that you have the tables joining backwards?

Change on A.zipcode = D.zipcode
To on D.zipcode = A.zipcode

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom

m0nkey · Sep 8, 2005

Thanks for the reply...
I am trying to filter out all the records in the avoid_areas table by zipcode...
i was using a 'where zipcode not in(select zipcode from avoid_areas)' before but i was trying to speed it up by forcing the index and use a left outer join.
it will work and return the corect result set if i use either the left outer before or the not in subquery...

the query will not work if i reverse the join...

thanks

travisbrown · Sep 8, 2005

Change on A.zipcode = D.zipcode
To on D.zipcode = A.zipcode"

Does this matter George (or anyone)? I've often wondered if it mattered if the ON statement was ordered the same as the tables.

vongrunt · Sep 8, 2005

Nope. Base/auxiliary tables are determined with order of tables in FROM clause and JOIN type. In blah1 left join blah2, blah1 will always be base table no matter how join expression looks like.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]

travisbrown · Sep 8, 2005

Now I know, and knowing is half the battle. I always put them in order because, just like ducks, I like to have things all in a row.

amrita418 · Sep 8, 2005

The left outer join depends on which table is specified on the left side of the 'LEFT OUTER JOIN" clause.

The order of fields in the ON clause should not matter.

Regards,
AA

gmmastros · Sep 8, 2005

Here I am, thinking I'm all that. Like Travis, I always put them in a certain order. I thought it made a difference, but I now realize it doesn't. I played around with this a little to verify.

I suspect my confusion came about because I learned about left and right joins back when the syntax looked like...

From A, B
Where A.Id *= B.Id

or

From A, B
Where A.Id =* B.Id

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom

vongrunt · Sep 8, 2005

I also put them in order but "reversed" one (A join B on B.foo=A.bar). Why exactly - dunno, probably something mental.

Back to original question (to: m0nkey)... How many rows query returns without last line? Can you post exec plans for both cases? Use SET SHOWPLAN_TEXT ON/GO before query to get plan as text output.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]

travisbrown · Sep 8, 2005

After all that, m0nkey, I have no idea why the NULL filter would slow things down so much. If I was taking a wild guess, I'd say the DISTINCT clause has that many more comparisons to make when you invoke the NULL filter. If there is a way to group or otherwise find distinct records, it might be a bit faster. Even a subquery, maybe? I'd try a group first, but just for fun, try something like this:

Code:

SELECT ACCT_NO, D_ZIPCODE 
FROM 
(Select distinct top 1000 D.acct_no AS ACCT_NO, D.zipcode AS  D_ZIPCODE, A.zipcode AS A_ZIPCODE
    from [20054_dup] D With (tablockx, holdlock)
    left outer join avoid_areas A WITH(INDEX(IX_avoid_areas)) on A.zipcode = D.zipcode
    left outer join repo_35 R on R.acct_no= D.acct_no
    WHERE D.timezone = '6'
    AND R.acct_no is null
    AND D.problem = '0') e
WHERE A_ZIPCODE is null

m0nkey · Sep 8, 2005

here it is without the null:
--Top(1000)
|--Stream Aggregate(GROUP BY

[D].[ACCT_NO]) DEFINE

[D].[ZIPCODE]=ANY([D].[ZIPCODE])))
|--Filter(WHERE

[R].[ACCT_NO]=NULL))
|--Merge Join(Left Outer Join, MERGE

[D].[ACCT_NO])=([R].[ACCT_NO]), RESIDUAL

[R].[ACCT_NO]=[D].[ACCT_NO]))
|--Clustered Index Scan(OBJECT

[ACCT_NOstore].[dbo].[20054_dup].[IX_20054_dup_3] AS [D]), WHERE

[D].[timezone]='6' AND [D].[problem]='0') ORDERED FORWARD)
|--Clustered Index Scan(OBJECT

[ACCT_NOstore].[dbo].[REPO_35].[CIX_REPO_35] AS [R]), ORDERED FORWARD)

HERE IT IS WITH:
Select distinct top 1000 D.ACCT_NO, D.zipcode from [20054_dup] D With (tablockx, holdlock) left outer join avoid_areas A WITH(INDEX(IX_avoid_areas)) on
D.zipcode = A.zipcode left outer join repo_35 R on R.ACCT_NO = D.ACCT_NO WHERE D.timezone = '6

|--Top(1000)
|--Stream Aggregate(GROUP BY

[D].[ACCT_NO]) DEFINE

[D].[ZIPCODE]=ANY([D].[ZIPCODE])))
|--Filter(WHERE

[R].[ACCT_NO]=NULL))
|--Parallelism(Gather Streams, ORDER BY

[D].[ACCT_NO] ASC))
|--Nested Loops(Left Outer Join, OUTER REFERENCES

[D].[ACCT_NO]))
|--Sort(ORDER BY

[D].[ACCT_NO] ASC))
| |--Filter(WHERE

[A].[zipcode]=NULL))
| |--Hash Match(Right Outer Join, HASH

[A].[zipcode])=([D].[ZIPCODE]), RESIDUAL

[D].[ZIPCODE]=[A].[zipcode]))
| |--Parallelism(Repartition Streams, PARTITION COLUMNS

[A].[zipcode]))
| | |--Clustered Index Scan(OBJECT

[ACCT_NOstore].[dbo].[avoid_areas].[IX_avoid_areas] AS [A]))
| |--Parallelism(Repartition Streams, PARTITION COLUMNS

[D].[ZIPCODE]))
| |--Clustered Index Scan(OBJECT

[ACCT_NOstore].[dbo].[20054_dup].[IX_20054_dup_3] AS [D]), WHERE

[D].[timezone]='6' AND [D].[problem]='0'))
|--Clustered Index Seek(OBJECT

[ACCT_NOstore].[dbo].[REPO_35].[CIX_REPO_35] AS [R]), SEEK

[R].[ACCT_NO]=[D].[ACCT_NO]) ORDERED FORWARD)

WHy does it have 2 statements on the second one?

I will test Travis Suggestion now...Thanks!!!

m0nkey · Sep 8, 2005

well Travis, that did not shave any time off...still running at 3.5+ minutes...
thanks...

Beuller,Beuller,Beuller...I'm just panhandling here i guess...

thanks for the suggestions in advance..

travisbrown · Sep 8, 2005

How fast if you just take the distinct off?

maswien · Sep 8, 2005

m0nkey,

The plan tells clearly why the extra filter badly affect the performance. The first plan shows that the table 20054_dup left outer join to table REPO_35 and then apply the filter (R.acct_no is null). Here SQL Server is smart enough to know that table 20054_dup left outer join to avoid_areas is not useful because there is no filter for table avoid_areas therefore this outer join won't affect the result given the select list is "distinct top 1000 D.acct_no, D.zipcode ", so the first plan saves a left outer join and that's why it's faster than the second. The second plan involves 2 outer join because the extra filter apply on the third table and in turn slow down the query.

SajidAttar · Sep 9, 2005

Hi Guys!

Have anyone noticed following

Code:

Select distinct top 1000 D.acct_no, D.zipcode
    from [20054_dup] D With (tablockx, holdlock)
    left outer join avoid_areas A WITH(INDEX(IX_avoid_areas)) on [B]A.zipcode = D.zipcode [/B]
    left outer join repo_35 R on R.acct_no= D.acct_no
    WHERE D.timezone = '6'
    AND R.acct_no is null
    AND D.problem = '0'
    [B]AND A.zipcode is null[/B]

M0nkey, When you are joining A.ZipCode = D.ZipCode then why you need to check for Null Zipcode?

Will you please clearfy, It will clear your requirement and could help us to Help you.

Regards,

"There are no secrets to success. It is the result of preparation, hard work, and learning from failure." -- Colin Powell

maswien · Sep 10, 2005

Am I wrong? Why you guys have no comments anymore?

r937 · Sep 11, 2005

SajidAttar said:
When you are joining A.ZipCode = D.ZipCode then why you need to check for Null Zipcode?

because that's what you do when you use a LEFT OUTER JOIN to find unmatched rows in the left table

monkey, if you do indeed want [20054_dup] rows which have no matching avoid_areas rows, then presumably {acct_no,zipcode} will be unique, so i'm betting the DISTINCT is superfluous

however, DISTINCT is very expensive, since it involves a sort of the entire result set on all columns, so your query will run a lot faster if you remove it

r937.com | rudy.ca

vongrunt · Sep 12, 2005

maswien said:
Am I wrong? Why you guys have no comments anymore?

Yes, you are right.

Without last line left join becomes "dummy" - no columns from A were used and query semantics remains intact - and server simplifies exec plan.

With last line (AND A.zipcode is null) table A cannot be ignored and exec time increases to 3.5 minutes.

Depending on strength/configuration of hardware, this response time is reasonable for 90M outer join filtering.

------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]

zhavic · Sep 13, 2005

Hi m0nkey,

can you post there more info about table 'repo_35' ?
I was trying simulate your problem, having 87 milion rows in '[20054_dup]' table, 12K rows in 'avoid_areas' table and 0 rows in 'repo_35' table and query runs for a 1 second,
but then I insert 800K rows in 'repo_35' table and quwry was too slow.

So I want to know, what indexes you has in that table, how may rows ( aproximately ) it has, and if the values in colums 'acct_no' of that table is unique or not.

Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Left Outer Join tune up...

MIS

Programmer

MIS

Technical User

Programmer

Technical User

Technical User

Programmer

Programmer

Technical User

MIS

MIS

Technical User

Technical User

Programmer

Technical User

Technical User

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor