Loading a Huge Table

Dagon · Dec 9, 2003

DB2 Version: 7
OS: Sun UNIX

I am loading a huge table with several million rows as part of a single transaction of the type:

insert into..
select ...
from ..
where ..

The time taken to do the insert seems to be very slow to me (it runs at about 100 rows a second). I've tried various options to improve the performance including:

a) dropping indexes before the insert and then re-creating them afterwards
b) using the alter table .. activate not logged initially to turn off logging

but none of it seems to help much. I was wondering whether anyone could suggest anything else I could try to speed it up. For example, would it be more efficient to dump the data to a flat file and then use LOAD to put it back into the target ?

gregsimpson · Dec 10, 2003

Dagon,

I suspect it may well be quicker to load from a flat file. We have a similar problem where I work and we partitioned the database into 32 partitions. We than have 32 streams loading each partition running in parallel, meaning the elapsed time of the job is almost 30 times less.

Cheers
Greg

pkganapathi · Dec 12, 2003

Hi,

INSERT INTO xxx SELECT * FROM yyy WHERE <<conditions>>>
always goes for logging of rows and this causes it to run slow.
As Greg mentioned above, a batch load with a LOG NO froma flat file will be much faster.

-PK.Ganapathy
DB2 DBA

sathyarams · Dec 17, 2003

The 'slow' running could also be a result of your SELECT statement's performance

Try improving your select performance also ..

Cheers
Sathyaram

For db2 resoruces visit

http://www.db2click.com

More DB2 questions answered at

http://www.dbforums.com/f8

Dagon · Dec 17, 2003

I thought of that and tried copying the data into another table first. I then inserted the rows directly from that table into the real table using a simple "select * from table" statement. I got very similar performance, so I don't think the select is the problem.

gregsimpson · Dec 17, 2003

Dagon,

it's also very important as regards speed of load, that you are loading in Clustering Index Order. Is that the case here?

Cheers
Greg

MCEVOY · Dec 17, 2003

I definitely think you should go for loading from a flat file using the LOAD Utility and Greg's point about having the file in Clustering Index Order is important. When sorting your flat file, remember that some columns may be nullable. Such columns should be sorted by 1) the Null Indicator, 2) the column value. This ensures that null values are sorted correctly to the end.
Pat McEvoy

udbeeedba · Dec 31, 2003

Actually, I think the not logged initially is not in effect in this case. I have found that they have to be issued in the same statement or it does not work(ie,logs).

I have found this to work. Run it is a shell script. Hope this helps.

. /udbhome/<instname>/sqllib/db2profile

DB2CMD1="alter table xyz.tab1 activate not logged initially"
DB2CMD2="insert into xyz.tab1 select * from xyz.tab2 where col3='AB'"
DB2CMD3="commit"

db2 connect to <dbname>;

db2 +c -tv "${DB2CMD1}"; db2 "${DB2CMD2}"; db2 "${DB2CMD3}";

db2 terminate;

JohnnyTime · Jan 2, 2004

Along the lines of what udbeeedba said,

The "Not Logged Initially" option gets reset after a commit. So if you'd like to continue doing your loading with individual insert statements, disable autocommits:

db2 update command options using c off

Dagon · Jan 5, 2004

Which of these is likely to be quicker:

a) drop all indexes before the start of the load and then rebuild at the end

b) drop all indexes apart from the cluster index before the load, load in cluster index order and then rebuild the other indexes at the end

c) don't drop any indexes but load in cluster index order

JohnnyTime · Jan 5, 2004

My experience has been that (a) (drop,load,index) is fastest.

Since load is NOT a series of INSERTs, then its just righting pages of data at a time, without touching indexes. The index is then rebuilt at the end. And then you'll have to reorg in order to get your indexes clustered properly.

Thus, simply having no indexes, and building them once all the data in place is usually faster.

udbeeedba · Jan 6, 2004

The reason you cluster data is for read performance. Are you always going to replace data? How do you intend to load data - with autoloads or inserts? If you choose autoloads, then remember that autoloads do not look for freespace - they start an extent with each run and load the data in. And if more then one splitter is specified (eee) then even if the input data is clustered there is no guarantee that the data is clustered in the database. So if data is deleted and autoloaded, even if you have a clustering index, data will really not be 'clustered' and you will have to reorg regularly.

It is faster to load without indexes. However, if data is appended and the table is big, rebuilding indexes from scratch may be quite time consuming - though the indexes are more efficient and better organised.

If data is deleted and inserted on a regular basis, then it is advantageous to use a clustering index(aside from the benefits for selects) along with pctfree. If pctfree is sized right then as much data will be added as deleted (with a safety margin) and reorgs at a certain frequency will keep both data and indexes clustered.

To sum it up, it depends.

Dagon · Jan 6, 2004

My data is presently being reloaded from scratch each time.

udbeeedba · Jan 7, 2004

Then I think you can autoload replace the data keeping indexes and PK on. I think it amounts to the same thing since either you build the indexes or autoload takes care of it in the build phase.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Loading a Huge Table

Dagon

MIS

gregsimpson

Programmer

pkganapathi

Programmer

sathyarams

IS-IT--Management

Dagon

MIS

gregsimpson

Programmer

MCEVOY

Programmer

udbeeedba

MIS

JohnnyTime

Technical User

Dagon

MIS

JohnnyTime

Technical User

udbeeedba

MIS

Dagon

MIS

udbeeedba

MIS

Similar threads

Part and Inventory Search

Sponsor