Help needed with SLOOOWWWWW query..

Lysen · Apr 7, 2003

Hi,

Can someone help me with the following query?

select agentnummer as number
,seizoen_03
,seizoen_02
,gidscode_03
,gidscode_02
,aantal_res
,pax
,babies
,gidsreissom
from v_sales_basis02

where gidscode_03 not in (select gidscode from v_sales_basis03
where seizoen = seizoen_03
and agentnummer = number);

When I run this one, it takes about 30 seconds to give me the two records missing. (v_sales_basis02 contains 45 rows, v_sales_basis03 contains 48 rows).

When I leave the agentnummer = number out of the query it takes 2 seconds at most!

Thanks, Lysen

Patten · Apr 7, 2003

Just put explain in front of your query and see if you are not doing a product join. I guess the columnnames are identical in both tables v_sales_basis03 and v_sales_basis02 and since you are not prefixing your columnnames with your tablenames you are probably doing a product join? (Just a quick guess of me...)

You could try it this way:

select t02.agentnummer as number
,t02.seizoen_03
,t02.seizoen_02
,t02.gidscode_03
,t02.gidscode_02
,t02.aantal_res
,t02.pax
,t02.babies
,t02.gidsreissom
from v_sales_basis02 t02

where t02.gidscode_03 not in (select t03.gidscode from v_sales_basis03 t03
where t03.seizoen = t02.seizoen_03);

BillDHS · Apr 8, 2003

When you have "and agentnummer = number" you are performing a correlated subquery instead of a two table join.

With a correlated subquery, for EACH row in the main query a search of the subquery is made to find a matching matching row, then a T/F value is returned.

DaraLynx · Apr 30, 2003

What indices have you created on the respective tables in the query? Do you have a common index with statistics gathered on the appropriate columns? I have found that IF you create a secondary index involving the columns in question (at least in my own queries) and collect statistics on those collumns, I receive a better response time (The explain shows an index join rather than a product join. Maybe this will help you as well?

Dara

-----------------------------------
The TROUBLE with TROUBLESHOOTING is that TROUBLE usually SHOOTS BACK!!!

poiuytrewq · May 14, 2003

as for "When you have "and agentnummer = number" you are performing a correlated subquery instead of a two table join. "

I think that as long as there is a inner query in where clause it's a correlated subquery.

BillDHS · May 14, 2003

If you don't have join criteria between the inner and outer tables, you are not performing a correlated subquery. Without the join, Teradata will spool a table of values returned by the subquery and do a regular join between the tables.

dnoeth · May 14, 2003

"for EACH row in the main query a search of the subquery is made to find a matching matching row"

This is how a correlated suquery is processed in theory, if a DBMS is really doing it that way, it's almost unusable for large tables.

"Without the join, Teradata will spool a table of values returned by the subquery and do a regular join between the tables."

And the optimizer will do the same for a correlated subquery, just look at explain ;-)

There's nothing like a "Nested Loop" in Teradata, if the optimizer can't rewrite the correlated subquery with an inner join, the result will be a product join.

Dieter

DougCox · May 20, 2003

If agentnummer is a PI and your formating it - it could potentially cause issues...

Liam1 · May 22, 2003

Hi,

assuming there is no product join, and you are joining on primary indecies-

There is a function called EXISTS and NOT EXISTS. There is also EXCEPT. Basically it depends on NULLS. These are infact harmonised by Teradata and work extremely fast, esp. when using large table.
NOT IN does a full table scan and does not use the indecies.

select blah1
,Blah2

from v_sales_basis02 t02
where t02.gidscode_03 NOT EXISTS
(select t03.gidscode from v_sales_basis03 t03
where t03.seizoen = t02.seizoen_03);

Liam1

dnoeth · May 22, 2003

"NOT IN does a full table scan and does not use the indecies."

This is definitly wrong.
NOT IN is usually optimized as an "exclude" join (exactly as NOT EXISTS or EXCEPT/MINUS), i.e. if both sides match it's *not* in the output.
If you're work on Primary Index columns it will use the PI directly without redistributuion of a spool.

Btw, Teradata usually only uses the PI to join, *no* secondary index. There's only one exception, it's the Nested Join.

Dieter

Liam1 · May 22, 2003

All due respect...but it's not wrong.

This goes against what (more than one from variety of sources) other people have taught us.

IN does use the index but NOT IN does a full table scan. Looking at the explains, an IN and NOT IN seem to work differently.

Using EXPLAINs as an example (using large tables, UPI, full stats collected, V3R4.1 box)

viz:
select Unique_ID
from
Table t1
where not in
(select unique_id
from Table_old t2)

Will yeild 2 full table retrievals with 2 full table scans, 2 exclusion merge joins (one for matches, one for non matches?). 9 steps on our 27m row tables.

viz:
select Unique_ID
from
Table t1
where not exists
(select unique_id
from Table_old t2
where t1.Unique_ID = t2.Unique_ID)

Carries out exclusion merge join. It does not retireve the tables and does not scan them. 5 steps on the same tables.

We have found increadibly large gains in performance by adopting the NOT EXISTS in our code.

Though it must be said, I do fully appreciate this might not be relevant here, as it does not sound like the original query is based on a large SQL query.

Might require further investigation?

Liam

dnoeth · May 22, 2003

If you run a NOT IN on NULLable columns, the optimizer will create a rather complex plan including aggregates to deal with NULLs. If you look at explain there's something like "skip this join step if there are no NULL values"/"skip this retrieve step if NULLs exist".
If the columns from both tables are NOT NULL, the plan will be the same for NOT EXISTS/NOT IN, the only difference to EXISTS/IN is inclusion vs. exclusion join.
In fact for EXISTS/IN it's the same plan even if there are NULLs.
And i doesn't matter if it's a large or small table ;-)

I usually recommend NOT EXISTS also, if there are nullable columns, because the answer set will be empty if there's a NULL returned by the subquery when using NOT IN. Most people don't know (or forget) about three-valued logic in NOT IN.

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help needed with SLOOOWWWWW query..

Lysen

Programmer

Patten

Technical User

BillDHS

Programmer

DaraLynx

Programmer

poiuytrewq

Instructor

BillDHS

Programmer

dnoeth

Instructor

DougCox

Programmer

Liam1

Programmer

dnoeth

Instructor

Liam1

Programmer

dnoeth

Instructor

Similar threads

Part and Inventory Search

Sponsor