Product Join

Lysen · Oct 1, 2004

Hi group,

I have this snippet from an explain here, how do I avoid this product join; it should NEVER be that many rows (more like a couple of 1000..)...

Code:

 11) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of an
     all-rows scan, which is joined to SYS_CALENDAR.CALDATES.  Spool 9
     and SYS_CALENDAR.CALDATES are joined using a product join, with a
     join condition of ("(SYS_CALENDAR.CALDATES.cdate <= tot_met_datum)
     AND (SYS_CALENDAR.CALDATES.cdate >= van_datum)").  The result goes
     into Spool 11 (all_amps), which is redistributed by hash code to
     all AMPs.  Then we do a SORT to order Spool 11 by row hash.  The
     result spool file will not be cached in memory.  The size of Spool
     11 is estimated with index join confidence to be 25,266,093 rows. 
     The estimated time for this step is 2 hours and 41 minutes.

tdatgod · Oct 1, 2004

Hi,
Can you post the SQL? Maybe if you mad a derived table out of the cal dates portion. Or maybe there is an easier way to get the infomation.

Lysen · Oct 4, 2004

hi tdatgod,

yes, this is the sql. I think it's kinda complicated..

Code:

sel  CASE 
         WHEN substring(T1."seizoen" from 3 for 1) = 'Z' 
         THEN ('Zomer') ELSE ('Winter') 
     END,
	  T2."Landcode" || ' - ' || T3."landnaam",
	  T1."objectnummer" || ' - ' || T2."Omschrijving",
	  T1."kamertype" || ' - ' || T4."omschrijving",
	  T1."van_datum",
	  T1."tot_met_datum",
	  T1."datum",
	  T1."allotment",
	  T1."dag_prijs",
	  T1."perc_gar_nachten" / 100,
	  T1."gar_omzet",
	  T1."percentage_werkelijk" / 100,
	  T1."omzet_werkelijk",
	  CASE 
	      WHEN (T1."geboekt_aantal" > T1."allotment") 
	      THEN (T1."allotment") ELSE (T1."geboekt_aantal") 
	  END,
	  (T1."perc_gar_nachten" / 100) * T1."allotment",
	  T1."allotment",
	  (CASE 
	       WHEN (T1."geboekt_aantal" > T1."allotment") 
	       THEN (T1."allotment") ELSE (T1."geboekt_aantal") 
	   END) * T1."dag_prijs",
	  ((T1."perc_gar_nachten" / 100) * T1."allotment") * T1."dag_prijs",
	  ((CASE 
	        WHEN (T1."geboekt_aantal" > T1."allotment") 
	        THEN (T1."allotment") ELSE (T1."geboekt_aantal") 
	    END) * T1."dag_prijs") - ((T1."percentage_werkelijk" / 100) * T1."dag_prijs" * T1."allotment"),
	  substring(T1."seizoen" from 1 for 2),
	  CASE 
	     WHEN month(T1."datum") = 1 THEN ('Januari') 
	     WHEN month(T1."datum") = 2 THEN ('Februari') 
	     WHEN month(T1."datum") = 3 THEN ('Maart') 
        WHEN month(T1."datum") = 4 THEN ('April') 
        WHEN month(T1."datum") = 5 THEN ('Mei') 
        WHEN month(T1."datum") = 6 THEN ('Juni')
        WHEN month(T1."datum") = 7 THEN ('July') 
        WHEN month(T1."datum") = 8 THEN ('Augustus') 
        WHEN month(T1."datum") = 9 THEN ('September') 
        WHEN month(T1."datum") = 10 THEN ('Oktober') 
        WHEN month(T1."datum") = 11 THEN ('November') 
        WHEN month(T1."datum") = 12 THEN ('December') 
        ELSE null 
     END,
	   week(T1."datum"),
	   CASE 
	       WHEN (T1."allotment" = 0) THEN (0) 
	       ELSE ((CASE 
	                  WHEN (T1."geboekt_aantal" > T1."allotment") THEN (T1."allotment") 
	                  ELSE (T1."geboekt_aantal") 
	              END) / T1."allotment") 
	   END,
	   (T1."perc_gar_nachten" / 100)
 from "COGNOS"."v_garantie" T1
 
 join "COGNOS"."v_object" T2
   on T1."bedrijfsnummer" = T2."Bedrijfsnummer"
  and T1."objectnummer" = T2."Objectnummer"
 
 join "COGNOS"."v_land" T3
   on T2."Bedrijfsnummer" = T3."bedrijfsnummer"
  and T2."Landcode" = T3."landcode"

 join "COGNOS"."v_huisvesting" T4
   on T1."bedrijfsnummer" = T4."bedrijfsnummer"
  and T1."kamertype" = T4."huisvestingscode"

where T1."seizoen" > '03Z'
order by 3,4,7 asc;

and the DDL of v_garantie

Code:

replace view cognos.v_garantie as
sel  gar.bedrijfsnummer  
    ,gar.seizoen
    ,gar.objectnummer
    ,gar.kamertype
    ,gar.van_datum
    ,gar.tot_met_datum
    ,calendar_date as datum
    ,gar.allotment
    ,cast(gar.dag_prijs AS DECIMAL(8,2)) as dag_prijs
    ,gar.perc_gar_nachten
    ,gar.gar_omzet/gar.aantal_nachten as gar_omzet
    ,gar.percentage_werkelijk
    ,gar.omzet_werkelijk / gar.aantal_nachten as omzet_werkelijk
    ,sum(case when calendar_date between objres.datum_van and objres.datum_tot then 1 else 0 end) as geboekt_aantal

from 
(sel  bedrijfsnummer
    , seizoen
    , objectnummer
    , kamertype
    , van_datum
    , tot_met_datum
    , allotment
    , dag_prijs
    , perc_gar_nachten
    , gar_omzet
    , aantal_nachten
    , percentage_werkelijk
    , omzet_werkelijk
 from plato.garantie
where (objectnummer
      ,seizoen
      ,volgnummer) in (select objectnummer
                            , seizoen
                            , max(volgnummer)
                         from plato.garantie
                        group by 1,2)) gar

join cognos.mycalendar cal
  on datum between gar.van_datum and gar.tot_met_datum

full outer join plato.object_reservering objres
  on gar.bedrijfsnummer = objres.bedrijfsnummer
 and gar.objectnummer = objres.objectnummer
 and gar.kamertype = objres.kamertype
 and gar.seizoen = objres.seizoen
 and objres.reserveringsvolgnummer is null
 and datum between objres.datum_van and objres.datum_tot
 and objres.dubbel_indicator is null

where gar.percentage_werkelijk is not null
group by 1,2,3,4,5,6,7,8,9,10,11,12,13;

I hope you've got the time to go trough this all...
or maybe you'll the problem in a sec! thanks in advance!

Lysen · Oct 4, 2004

oh and this is the complete explain...

Code:

	Explanation
	  1) First, we lock a distinct plato."pseudo table" for read on a
	     RowHash to prevent global deadlock for plato.objres. 
	  2) Next, we lock a distinct plato."pseudo table" for read on a
	     RowHash to prevent global deadlock for plato.object. 
	  3) We lock a distinct plato."pseudo table" for read on a RowHash to
	     prevent global deadlock for plato.land. 
	  4) We lock a distinct plato."pseudo table" for read on a RowHash to
	     prevent global deadlock for plato.huisvesting. 
	  5) We lock a distinct plato."pseudo table" for read on a RowHash to
	     prevent global deadlock for plato.garantie. 
	  6) We lock a distinct SYS_CALENDAR."pseudo table" for read on a
	     RowHash to prevent global deadlock for SYS_CALENDAR.CALDATES. 
	  7) We lock plato.objres for read, we lock plato.object for read, we
	     lock plato.land for read, we lock plato.huisvesting for read, we
	     lock plato.garantie for read, and we lock SYS_CALENDAR.CALDATES
	     for read. 
	  8) We do an all-AMPs SUM step to aggregate from plato.garantie by way
	     of an all-rows scan with no residual conditions, and the grouping
	     identifier in field 1.  Aggregate Intermediate Results are
	     computed locally, then placed in Spool 6.  The size of Spool 6 is
	     estimated with low confidence to be 78 rows.  The estimated time
	     for this step is 0.61 seconds. 
	  9) We do an all-AMPs RETRIEVE step from Spool 6 (Last Use) by way of
	     an all-rows scan into Spool 8 (all_amps), which is redistributed
	     by hash code to all AMPs.  Then we do a SORT to order Spool 8 by
	     row hash.  The size of Spool 8 is estimated with low confidence to
	     be 78 rows.  The estimated time for this step is 0.17 seconds. 
	 10) We execute the following steps in parallel. 
	      1) We do an all-AMPs JOIN step from plato.garantie by way of an
	         all-rows scan with a condition of ("(NOT
	         (plato.garantie.percentage_werkelijk IS NULL )) AND
	         (plato.garantie.seizoen > '03Z')"), which is joined to Spool 8
	         (Last Use).  plato.garantie and Spool 8 are joined using an
	         inclusion merge join, with a join condition of (
	         "(plato.garantie.objectnummer = objectnummer) AND
	         ((plato.garantie.seizoen = seizoen) AND
	         (plato.garantie.volgnummer = Field_4 ))").  The result goes
	         into Spool 9 (all_amps), which is duplicated on all AMPs.  The
	         size of Spool 9 is estimated with index join confidence to be
	         5,516 rows.  The estimated time for this step is 0.69 seconds. 
	      2) We do an all-AMPs RETRIEVE step from plato.objres by way of an
	         all-rows scan with a condition of (
	         "(plato.objres.reserveringsvolgnummer IS NULL) AND
	         (plato.objres.dubbel_indicator IS NULL)") into Spool 10
	         (all_amps), which is redistributed by hash code to all AMPs. 
	         Then we do a SORT to order Spool 10 by row hash.  The size of
	         Spool 10 is estimated with high confidence to be 838,801 rows. 
	         The estimated time for this step is 2 minutes and 17 seconds. 
	 11) We do an all-AMPs JOIN step from Spool 9 (Last Use) by way of an
	     all-rows scan, which is joined to SYS_CALENDAR.CALDATES.  Spool 9
	     and SYS_CALENDAR.CALDATES are joined using a product join, with a
	     join condition of ("(SYS_CALENDAR.CALDATES.cdate <= tot_met_datum)
	     AND (SYS_CALENDAR.CALDATES.cdate >= van_datum)").  The result goes
	     into Spool 11 (all_amps), which is redistributed by hash code to
	     all AMPs.  Then we do a SORT to order Spool 11 by row hash.  The
	     result spool file will not be cached in memory.  The size of Spool
	     11 is estimated with index join confidence to be 25,245,230 rows. 
	     The estimated time for this step is 2 hours and 41 minutes. 
	 12) We do an all-AMPs JOIN step from Spool 10 (Last Use) by way of a
	     RowHash match scan, which is joined to Spool 11 (Last Use).  Spool
	     10 and Spool 11 are right outer joined using a merge join, with a
	     join condition of ("(cdate <= datum_tot) AND ((cdate >= datum_van)
	     AND ((seizoen = seizoen) AND ((kamertype = kamertype) AND
	     ((objectnummer = objectnummer) AND (Bedrijfsnummer =
	     Bedrijfsnummer )))))").  The result goes into Spool 3 (all_amps),
	     which is built locally on the AMPs.  The result spool file will
	     not be cached in memory.  The size of Spool 3 is estimated with
	     low confidence to be 418,865,081 rows.  The estimated time for
	     this step is 9 hours and 34 minutes. 
	 13) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
	     way of an all-rows scan, and the grouping identifier in field 1. 
	     Aggregate Intermediate Results are computed globally, then placed
	     in Spool 14.  The aggregate spool file will not be cached in
	     memory.  The size of Spool 14 is estimated with low confidence to
	     be 418,865,081 rows.  The estimated time for this step is 299
	     hours and 35 minutes. 
	 14) We execute the following steps in parallel. 
	      1) We do an all-AMPs RETRIEVE step from Spool 14 (Last Use) by
	         way of an all-rows scan into Spool 1 (all_amps), which is
	         built locally on the AMPs.  The result spool file will not be
	         cached in memory.  The size of Spool 1 is estimated with low
	         confidence to be 418,865,081 rows.  The estimated time for
	         this step is 11 hours and 5 minutes. 
	      2) We do an all-AMPs RETRIEVE step from plato.land by way of an
	         all-rows scan with no residual conditions into Spool 17
	         (all_amps), which is duplicated on all AMPs.  The size of
	         Spool 17 is estimated with high confidence to be 1,656 rows. 
	         The estimated time for this step is 0.56 seconds. 
	      3) We do an all-AMPs RETRIEVE step from plato.huisvesting by way
	         of an all-rows scan with a condition of (
	         "plato.huisvesting.DatumTot IS NULL") into Spool 18 (all_amps),
	         which is duplicated on all AMPs.  Then we do a SORT to order
	         Spool 18 by row hash.  The size of Spool 18 is estimated with
	         high confidence to be 31,484 rows.  The estimated time for
	         this step is 1.80 seconds. 
	 15) We execute the following steps in parallel. 
	      1) We do an all-AMPs JOIN step from Spool 17 (Last Use) by way of
	         an all-rows scan, which is joined to Spool 1 (Last Use) with a
	         condition of ("SEIZOEN > '03Z'").  Spool 17 and Spool 1 are
	         joined using a product join, with a join condition of (
	         "BEDRIJFSNUMMER = Bedrijfsnummer").  The result goes into
	         Spool 19 (all_amps), which is built locally on the AMPs.  Then
	         we do a SORT to order Spool 19 by row hash.  The result spool
	         file will not be cached in memory.  The size of Spool 19 is
	         estimated with no confidence to be 8,473,004 rows.  The
	         estimated time for this step is 30 hours and 32 minutes. 
	      2) We do an all-AMPs RETRIEVE step from plato.object by way of an
	         all-rows scan with a condition of ("plato.object.DatumTot IS
	         NULL") into Spool 20 (all_amps), which is duplicated on all
	         AMPs.  Then we do a SORT to order Spool 20 by row hash.  The
	         size of Spool 20 is estimated with high confidence to be
	         111,460 rows.  The estimated time for this step is 7.02
	         seconds. 
	 16) We do an all-AMPs JOIN step from Spool 18 (Last Use) by way of a
	     RowHash match scan, which is joined to Spool 19 (Last Use).  Spool
	     18 and Spool 19 are joined using a merge join, with a join
	     condition of ("(Bedrijfsnummer = Bedrijfsnummer) AND ((KAMERTYPE =
	     huisvestingscode) AND (BEDRIJFSNUMMER = Bedrijfsnummer ))").  The
	     result goes into Spool 21 (all_amps), which is built locally on
	     the AMPs.  Then we do a SORT to order Spool 21 by row hash.  The
	     result spool file will not be cached in memory.  The size of Spool
	     21 is estimated with no confidence to be 8,059,338 rows.  The
	     estimated time for this step is 1 hour and 31 minutes. 
	 17) We do an all-AMPs JOIN step from Spool 20 (Last Use) by way of a
	     RowHash match scan, which is joined to Spool 21 (Last Use).  Spool
	     20 and Spool 21 are joined using a merge join, with a join
	     condition of ("(Bedrijfsnummer = Bedrijfsnummer) AND
	     ((Bedrijfsnummer = Bedrijfsnummer) AND ((Landcode = landcode) AND
	     ((OBJECTNUMMER = Objectnummer) AND (BEDRIJFSNUMMER =
	     Bedrijfsnummer ))))").  The result goes into Spool 16 (group_amps),
	     which is built locally on the AMPs.  Then we do a SORT to order
	     Spool 16 by the sort key in spool field1.  The result spool file
	     will not be cached in memory.  The size of Spool 16 is estimated
	     with no confidence to be 7,756,751 rows.  The estimated time for
	     this step is 26 minutes and 40 seconds. 
	 18) Finally, we send out an END TRANSACTION step to all AMPs involved
	     in processing the request.
	  -> The contents of Spool 16 are sent back to the user as the result
	     of statement 1.

tdatgod · Oct 4, 2004

Hi,
I don't see any reference to SYS_CALENDAR in your query? Is it hidden inside this view?

join cognos.mycalendar cal
on datum between gar.van_datum and gar.tot_met_datum

What does Cognos.mycalendar look like?

Lysen · Oct 5, 2004

Hi,

Yes, mycalendar is a view on sys_calendar created by some NCR consultants.
It's almost the same, but first day of the week is monday instead of sunday...

dnoeth · Oct 6, 2004

Hi Lysen,

you can't get rid of the Product Join, because of the inequality in join condition:

"datum between objres.datum_van and objres.datum_tot"

I would suggest to add a where condition to restrict the range of dates in cognos.v_garantie

...) gar
join
(select calendar_date
from cognos.mycalendar
where calendar_date between Lowest_Possible_Date and Highest_Possible_Date
) cal

Instead of cross joining to 73414 rows (201 years in sys_calendar) it'll be probably only one year or even less.

Maybe you just have to add that condition to the outer most query, but i don't know if the optimizer wil push it into the view...

The "full outer join plato.object_reservering" probably may be replaced by e right join, because you filter on
"where gar.percentage_werkelijk is not null"

You shouldn't use "week(T1."datum")", because this is a ODBC function and only works in QueryMan when
Tools -> Options -> "Allow use of ODBC SQL Extensions in queries" is checked.
And it's probably not returning the expected result, because week uses sunday as first day of week and the week number may be different from ISO/European numbers.

And instead of the nested aggregate in v_garantie a Rank might be faster:

from
(sel bedrijfsnummer
, seizoen
, objectnummer
, kamertype
, van_datum
, tot_met_datum
, allotment
, dag_prijs
, perc_gar_nachten
, gar_omzet
, aantal_nachten
, percentage_werkelijk
, omzet_werkelijk
from plato.garantie
qualify rank() over (
partition by objectnummer, seizoen
order by max(volgnummer) desc) = 1
) gar

Dieter

Lysen · Oct 7, 2004

Hi Dieter,

Thanks (once again!) for all of your tips, I'll go try these to see how much this is gonna speed it up!

I noticed your posting on the teradata forum a while ago about a replacement sys_calendar... Should I use that one to get the week of current date instead of week(date)?

Lysen

dnoeth · Oct 7, 2004

You mentioned cognos.mycalendar as a modified version including monday as first day of week. Didn't they change the definition of week, too?

I would recommend a calendar table with all the necessary information in it.
This way you may include special holydays like Eastern, which is quite hard to calculate in SQL.
It's the same for ISO Week numbers.

If you calculate everything on the fly within a view you can't collect stats on columns like day_of_week or (year,month) to help the optimizer.

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Product Join

Lysen

Programmer

tdatgod

Programmer

Lysen

Programmer

Lysen

Programmer

tdatgod

Programmer

Lysen

Programmer

dnoeth

Instructor

Lysen

Programmer

dnoeth

Instructor

Similar threads

Part and Inventory Search

Sponsor