Update Processing

BillDHS · Jul 31, 2003

We had a 300m row table with all rows being updated that took an unexpectantly long time, especially when we aborted the update and a rollback took place (live and learn).

I believe I understand how TD is performing updates, but I would like to review my understanding.

There is a Target table being changed and a Source table containing the update info.

1. The set of rows on the Target table being changed are sent to a spool file and deleted from the Target table.

2. The rows in the spool table rows are modified using the Source table update info.

3. The changed rows in the spool are inserted back into the Target table.

Added questions: If all rows are being changed, are all rows moved to spool and then the final insert into an empty table?

Would it have been better to do a 2 table merge into a 3rd table and then rename it back to the original Target name?

dnoeth · Jul 31, 2003

Just EXPLAIN your query and you'll probably see a plan similar to your description (pre-V2R5):

Two steps in parallel
- Join source and target table into a spool x consisting of the new target rows
- Join source and target table into a spool y consisting of the ROWIDs of the rows to be modified

- MERGE DELETE step from target table using spool y
- MERGE step into target table from spool x

If you run
"lock table foobar access select count(*) from foobar;"
you can watch the table shrink and grow again. And you can calculate how long it will take until the rollback to 300M rows finishes ;-)

If you update a large percentage of rows, it's usually more efficient to INSERT/SELECT into an empty table doing the modifications on the fly and DROP/RENAME:
No write lock on the target table, no transient journal needed (there's only one entry indicating the table was empty) and therefore a really fast rollback (similar to a fast DELETE FROM foobar

Even if the UPDATE is not much slower than the INSERT/SELECT, if there's a rollback there'll be a huge difference (as you already learned). If you run an UPDATE for 1 hour a rollback usually needs about 1 1/2 to 2 1/2 hours (and somtimes even more) whereas INSERT/SELECT abort within seconds.

That delete/insert plan is true before V2R5, now it's trying to do the updates in place without delete/insert:
- Merge update

So those UPDATES are much faster now, sometimes as fast as INSERT/SELECT, but there's still the possible rollback.

I just EXPLAINed a Join Update, run on my laptop, so don't look at estimated time ;-)

New plan, V2R5:
4) We do a MERGE Update to TERADATA_EDUCATION.accounts from
TERADATA_EDUCATION.trans by way of a RowHash match scan.

Old plan, Pre-V2R5:
4) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step from TERADATA_EDUCATION.accounts
by way of a RowHash match scan with no residual conditions,
which is joined to TERADATA_EDUCATION.trans.
TERADATA_EDUCATION.accounts and TERADATA_EDUCATION.trans are
joined using a merge join, with a join condition of (
"TERADATA_EDUCATION.accounts.ACCOUNT_NUMBER =
TERADATA_EDUCATION.trans.ACCOUNT_NUMBER&quot

. The input tables
TERADATA_EDUCATION.accounts and TERADATA_EDUCATION.trans will
not be cached in memory, but TERADATA_EDUCATION.accounts is
eligible for synchronized scanning. The result goes into
Spool 1 (all_amps), which is built locally on the AMPs. Then
we do a SORT to order Spool 1 by row hash. The result spool
file will not be cached in memory. The size of Spool 1 is
estimated with low confidence to be 1,996,340 rows. The
estimated time for this step is 25 minutes and 39 seconds.
2) We do an all-AMPs JOIN step from TERADATA_EDUCATION.accounts
by way of a RowHash match scan with no residual conditions,
which is joined to TERADATA_EDUCATION.trans.
TERADATA_EDUCATION.accounts and TERADATA_EDUCATION.trans are
joined using a merge join, with a join condition of (
"TERADATA_EDUCATION.accounts.ACCOUNT_NUMBER =
TERADATA_EDUCATION.trans.ACCOUNT_NUMBER&quot

. The input tables
TERADATA_EDUCATION.accounts and TERADATA_EDUCATION.trans will
not be cached in memory, but TERADATA_EDUCATION.accounts is
eligible for synchronized scanning. The result goes into
Spool 2 (all_amps), which is built locally on the AMPs. Then
we do a SORT to order Spool 2 by row hash. The result spool
file will not be cached in memory. The size of Spool 2 is
estimated with low confidence to be 1,996,340 rows. The
estimated time for this step is 25 minutes and 39 seconds.
5) We do an all-AMPs MERGE DELETE to TERADATA_EDUCATION.accounts from
Spool 1 (Last Use).
6) We do an all-AMPs MERGE into TERADATA_EDUCATION.accounts from
Spool 2 (Last Use).

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Update Processing

BillDHS

Programmer

dnoeth

Instructor

Similar threads

Part and Inventory Search

Sponsor