Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help needed with SLOOOWWWWW query..

Status
Not open for further replies.

Lysen

Programmer
Apr 9, 2002
87
NL
Hi,

Can someone help me with the following query?

select agentnummer as number
,seizoen_03
,seizoen_02
,gidscode_03
,gidscode_02
,aantal_res
,pax
,babies
,gidsreissom
from v_sales_basis02

where gidscode_03 not in (select gidscode from v_sales_basis03
where seizoen = seizoen_03
and agentnummer = number);

When I run this one, it takes about 30 seconds to give me the two records missing. (v_sales_basis02 contains 45 rows, v_sales_basis03 contains 48 rows).

When I leave the agentnummer = number out of the query it takes 2 seconds at most!

Thanks, Lysen
 
Just put explain in front of your query and see if you are not doing a product join. I guess the columnnames are identical in both tables v_sales_basis03 and v_sales_basis02 and since you are not prefixing your columnnames with your tablenames you are probably doing a product join? (Just a quick guess of me...)

You could try it this way:

select t02.agentnummer as number
,t02.seizoen_03
,t02.seizoen_02
,t02.gidscode_03
,t02.gidscode_02
,t02.aantal_res
,t02.pax
,t02.babies
,t02.gidsreissom
from v_sales_basis02 t02

where t02.gidscode_03 not in (select t03.gidscode from v_sales_basis03 t03
where t03.seizoen = t02.seizoen_03);
 
When you have "and agentnummer = number" you are performing a correlated subquery instead of a two table join.

With a correlated subquery, for EACH row in the main query a search of the subquery is made to find a matching matching row, then a T/F value is returned.
 
What indices have you created on the respective tables in the query? Do you have a common index with statistics gathered on the appropriate columns? I have found that IF you create a secondary index involving the columns in question (at least in my own queries) and collect statistics on those collumns, I receive a better response time (The explain shows an index join rather than a product join. Maybe this will help you as well?

Dara

-----------------------------------
The TROUBLE with TROUBLESHOOTING is that TROUBLE usually SHOOTS BACK!!!
 
as for "When you have "and agentnummer = number" you are performing a correlated subquery instead of a two table join. "

I think that as long as there is a inner query in where clause it's a correlated subquery.

 
If you don't have join criteria between the inner and outer tables, you are not performing a correlated subquery. Without the join, Teradata will spool a table of values returned by the subquery and do a regular join between the tables.
 
"for EACH row in the main query a search of the subquery is made to find a matching matching row"

This is how a correlated suquery is processed in theory, if a DBMS is really doing it that way, it's almost unusable for large tables.

"Without the join, Teradata will spool a table of values returned by the subquery and do a regular join between the tables."

And the optimizer will do the same for a correlated subquery, just look at explain ;-)

There's nothing like a "Nested Loop" in Teradata, if the optimizer can't rewrite the correlated subquery with an inner join, the result will be a product join.

Dieter
 
If agentnummer is a PI and your formating it - it could potentially cause issues...
 
Hi,

assuming there is no product join, and you are joining on primary indecies-

There is a function called EXISTS and NOT EXISTS. There is also EXCEPT. Basically it depends on NULLS. These are infact harmonised by Teradata and work extremely fast, esp. when using large table.
NOT IN does a full table scan and does not use the indecies.

select blah1
,Blah2

from v_sales_basis02 t02
where t02.gidscode_03 NOT EXISTS
(select t03.gidscode from v_sales_basis03 t03
where t03.seizoen = t02.seizoen_03);

Liam1
 
"NOT IN does a full table scan and does not use the indecies."

This is definitly wrong.
NOT IN is usually optimized as an "exclude" join (exactly as NOT EXISTS or EXCEPT/MINUS), i.e. if both sides match it's *not* in the output.
If you're work on Primary Index columns it will use the PI directly without redistributuion of a spool.

Btw, Teradata usually only uses the PI to join, *no* secondary index. There's only one exception, it's the Nested Join.

Dieter
 
All due respect...but it's not wrong.

This goes against what (more than one from variety of sources) other people have taught us.

IN does use the index but NOT IN does a full table scan. Looking at the explains, an IN and NOT IN seem to work differently.

Using EXPLAINs as an example (using large tables, UPI, full stats collected, V3R4.1 box)

viz:
select Unique_ID
from
Table t1
where not in
(select unique_id
from Table_old t2)

Will yeild 2 full table retrievals with 2 full table scans, 2 exclusion merge joins (one for matches, one for non matches?). 9 steps on our 27m row tables.

viz:
select Unique_ID
from
Table t1
where not exists
(select unique_id
from Table_old t2
where t1.Unique_ID = t2.Unique_ID)

Carries out exclusion merge join. It does not retireve the tables and does not scan them. 5 steps on the same tables.

We have found increadibly large gains in performance by adopting the NOT EXISTS in our code.

Though it must be said, I do fully appreciate this might not be relevant here, as it does not sound like the original query is based on a large SQL query.

Might require further investigation?

Liam
 
If you run a NOT IN on NULLable columns, the optimizer will create a rather complex plan including aggregates to deal with NULLs. If you look at explain there's something like "skip this join step if there are no NULL values"/"skip this retrieve step if NULLs exist".
If the columns from both tables are NOT NULL, the plan will be the same for NOT EXISTS/NOT IN, the only difference to EXISTS/IN is inclusion vs. exclusion join.
In fact for EXISTS/IN it's the same plan even if there are NULLs.
And i doesn't matter if it's a large or small table ;-)

I usually recommend NOT EXISTS also, if there are nullable columns, because the answer set will be empty if there's a NULL returned by the subquery when using NOT IN. Most people don't know (or forget) about three-valued logic in NOT IN.


Dieter
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top