I have been driven to distraction as I am building a table of daily data which requires insertions in the following order from a journal file.
step 1 insert changes
step 2 insert records from previous day with no change.
I have started using a partitioned index to speed things up but have found that the first two bits of sql below do not bring back all the rows - 7660 are missing thus it is not an all or nothing event but it works for almost all.
when i use a minus or except then the remaining rows are returned. I suspect the query that works has something to do with full-row scan in the explain, and this suggests that I need to better understand PPI's. My Teradata SQL Manual - "unleash the power" does not give me much information. Could anyone point me in the right direction?
SQL 1
EXPLAIN SELECT CARD_ACCOUNT_NO
FROM CC_COBRA.CC_CMF_DAILY_TXN_0505
WHERE BUS_DT = 1050503
AND CARD_ACCOUNT_NO NOT IN
(
SELECT CARD_ACCOUNT_NO
FROM CC_COBRA.CC_CMF_DAILY_TXN_0505
WHERE BUS_DT = 1050504
);
EXPLAIN
1) First, we lock a distinct CC_COBRA."pseudo table" for read on a
RowHash to prevent global deadlock for
CC_COBRA.CC_CMF_DAILY_TXN_0505.
2) Next, we lock CC_COBRA.CC_CMF_DAILY_TXN_0505 for read.
3) We do an all-AMPs JOIN step from a single partition of
CC_COBRA.CC_CMF_DAILY_TXN_0505 with a condition of (
"CC_COBRA.CC_CMF_DAILY_TXN_0505.BUS_DT = DATE '2005-05-03'") with
a residual condition of ("CC_COBRA.CC_CMF_DAILY_TXN_0505.BUS_DT =
DATE '2005-05-03'"), which is joined to a single partition of
CC_COBRA.CC_CMF_DAILY_TXN_0505 with a condition of (
"CC_COBRA.CC_CMF_DAILY_TXN_0505.BUS_DT = DATE '2005-05-04'") with
a residual condition of ("CC_COBRA.CC_CMF_DAILY_TXN_0505.BUS_DT =
DATE '2005-05-04'"). CC_COBRA.CC_CMF_DAILY_TXN_0505 and
CC_COBRA.CC_CMF_DAILY_TXN_0505 are joined using an exclusion merge
join, with a join condition of (
"CC_COBRA.CC_CMF_DAILY_TXN_0505.CARD_ACCOUNT_NO =
CC_COBRA.CC_CMF_DAILY_TXN_0505.CARD_ACCOUNT_NO"). The result goes
into Spool 1 (group_amps), which is built locally on the AMPs.
The size of Spool 1 is estimated with no confidence to be
2,644,970 rows. The estimated time for this step is 11.23 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 11.23 seconds.
RESUT NO ROWS
/****************************************************************************************************/
SQL2
EXPLAIN SELECT CARD_ACCOUNT_NO
FROM CC_COBRA.CC_CMF_DAILY_TXN_0505 A
WHERE BUS_DT = 1050503
AND NOT EXISTS
(SELECT CARD_ACCOUNT_NO
FROM CC_COBRA.CC_CMF_DAILY_TXN_0505 B
WHERE B.BUS_DT = 1050504
AND A.CARD_ACCOUNT_NO = B.CARD_ACCOUNT_NO
);
EXPLAIN
1) First, we lock a distinct CC_COBRA."pseudo table" for read on a
RowHash to prevent global deadlock for CC_COBRA.A.
2) Next, we lock CC_COBRA.A for read.
3) We do an all-AMPs JOIN step from a single partition of CC_COBRA.A
with a condition of ("CC_COBRA.A.BUS_DT = DATE '2005-05-03'") with
a residual condition of ("CC_COBRA.A.BUS_DT = DATE '2005-05-03'"),
which is joined to a single partition of CC_COBRA.B with a
condition of ("CC_COBRA.B.BUS_DT = DATE '2005-05-04'") with a
residual condition of ("CC_COBRA.B.BUS_DT = DATE '2005-05-04'").
CC_COBRA.A and CC_COBRA.B are joined using an exclusion merge join,
with a join condition of ("CC_COBRA.A.CARD_ACCOUNT_NO =
CC_COBRA.B.CARD_ACCOUNT_NO") where unknown comparision will be
ignored. The result goes into Spool 1 (group_amps), which is
built locally on the AMPs. The size of Spool 1 is estimated with
no confidence to be 2,644,970 rows. The estimated time for this
step is 11.23 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 11.23 seconds.
RESULT NO ROWS
/******************************************************************************************************/
SQL 3
EXPLAIN SELECT CARD_ACCOUNT_NO
FROM CC_COBRA.CC_CMF_DAILY_TXN_0505 A
WHERE BUS_DT = 1050503
MINUS
SELECT CARD_ACCOUNT_NO
FROM CC_COBRA.CC_CMF_DAILY_TXN_0505 B
WHERE B.BUS_DT = 1050504;
EXPLAIN
1) First, we lock a distinct CC_COBRA."pseudo table" for read on a
RowHash to prevent global deadlock for CC_COBRA.A.
2) Next, we lock CC_COBRA.A for read.
3) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from a single partition of
CC_COBRA.A with a condition of ("CC_COBRA.A.BUS_DT = DATE
'2005-05-03'") with a residual condition of (
"CC_COBRA.A.BUS_DT = DATE '2005-05-03'") into Spool 1
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 1 by row hash and the sort key in spool
field1 eliminating duplicate rows. The result spool file
will not be cached in memory. The size of Spool 1 is
estimated with no confidence to be 2,644,970 rows. The
estimated time for this step is 36.72 seconds.
2) We do an all-AMPs RETRIEVE step from a single partition of
CC_COBRA.B with a condition of ("CC_COBRA.B.BUS_DT = DATE
'2005-05-04'") with a residual condition of (
"CC_COBRA.B.BUS_DT = DATE '2005-05-04'") into Spool 2
(all_amps), which is built locally on the AMPs. Then we do a
SORT to order Spool 2 by row hash and the sort key in spool
field1 eliminating duplicate rows. The result spool file
will not be cached in memory. The size of Spool 2 is
estimated with no confidence to be 2,644,970 rows. The
estimated time for this step is 36.72 seconds.
4) We do an all-AMPs JOIN step from Spool 1 (Last Use) by way of an
all-rows scan, which is joined to Spool 2 (Last Use) by way of an
all-rows scan. Spool 1 and Spool 2 are joined using an exclusion
merge join, with a join condition of ("Field_1 = Field_1"). The
result goes into Spool 3 (group_amps), which is built locally on
the AMPs. The size of Spool 3 is estimated to be 1,322,485 rows.
The estimated time for this step is 22.13 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 3 are sent back to the user as the result of
statement 1. The total estimated time is 58.84 seconds.
RESULT 7660 ROWS
/******************************************************************************************************/
Here is the DDL for the table:
CREATE SET TABLE CC_COBRA.CC_CMF_DAILY_TXN_0505 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
FREESPACE = 0 PERCENT
(
CARD_ACCOUNT_NO CHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
BUS_DT DATE FORMAT 'YY/MM/DD' NOT NULL,
CURR_BAL DECIMAL(9,2) COMPRESS 0.00 )
PRIMARY INDEX ( CARD_ACCOUNT_NO )
PARTITION BY CASE_N(
BUS_DT > DATE '2003-11-30');
thanks again in advance