Multiload into a production table

RogerCruse · Jul 21, 2004

Hi,
We are designing a Multiload process to insert 8 million records a day into a production table and keep up to 13 months data. (A purge process will be run each weekend).

I have some concerns about this, as previously, I always designed any load process to load into an empty table, validate the data and then finally INSERT SELECT into the production table.

My concerns with the "Multiload into production table" are:
1. If the load fails, what state is the table in, is it still readable?
2. Will the users see a "moving target", that is, while the load is running, the table should still be readable and will a user notice that the total number of rows keeps changing?
3. What happens with a failure, can the rows insert be removed easily?

The only weakness of the "Multiload into an empty table then INSERT SELECT" seems to be the potential time taken to rowback take in the event of a failure. Are there any other issues I should consider?

Roger...

BillDHS · Jul 22, 2004

You should consider TPump for the load. We have experience multiple failures of multiload and fastload processes and with a large table, rollback can be days. This is particularly true if you have multiple secondary indexes. Tpump is a row level lock so conflicts with users and rollbck from failure is minimal. Obviously any load method will leave a "moving target" on an active table during the load process.

If you have space use two tables, ML into one, then rename it the production name or change views. Then ML load the same data into the backup table. This will limit users inability to view stable production data to the seconds it take to rename the table or repoint the view.

dnoeth · Jul 22, 2004

8000000*13*31 -> about 3.200.000.000 rows?
You're absolutely right to think about that load job in detail ;-)

A MLoad job must be finished as soon as the apply phase starts, this is the point of no return.
If a job fails after that it must be restarted.
If it still can't be finished the table will have inconsistent data and can only be selected using an access lock, i.e. you can copy the table before you drop it.
But if you only inserted data you can probably delete them from the copy.

That's why MLoad jobs should be tested before used in production.

If the target table has a Permanent Journal it can be used to rollback *after* the job finished.

During the load the data can still be selected using an access lock, but the data changes very fast.

Are you on V2R5? Did you consider using a Partitioned Primary Index? If you partition by day loading daily data will be very fast, regardless of MLoad or Ins/Sel.
And even with monthly partitions you only have to access 1/13 of the datablocks...

Other issues include
NUSI maintenance: drop/recreate some/all?
Statistics maintenance: freqently collect stats on the date column

Dieter

RogerCruse · Jul 22, 2004

Dieter & BillDHS,
Thanks for the quick response.

To summarise,
1. Multiload would require additional support considerations in the event of failure.
2. Any secondary indexes would need to be managed
3. Don't rush the design, you'll save time later.

As I'm more concerned with the supportability than performance, I'll stick with the insert / select solution.

I have considered combining the Purge processing with the new data processing by copying into a new table using MSR, but this means adding too much processing on the batch critical path.

Thanks again.

Roger...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Multiload into a production table

RogerCruse

MIS

BillDHS

Programmer

dnoeth

Instructor

RogerCruse

MIS

Similar threads

Part and Inventory Search

Sponsor