Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Loading a Huge Table

Status
Not open for further replies.

Dagon

MIS
Jan 30, 2002
2,301
GB
DB2 Version: 7
OS: Sun UNIX

I am loading a huge table with several million rows as part of a single transaction of the type:

insert into..
select ...
from ..
where ..

The time taken to do the insert seems to be very slow to me (it runs at about 100 rows a second). I've tried various options to improve the performance including:

a) dropping indexes before the insert and then re-creating them afterwards
b) using the alter table .. activate not logged initially to turn off logging

but none of it seems to help much. I was wondering whether anyone could suggest anything else I could try to speed it up. For example, would it be more efficient to dump the data to a flat file and then use LOAD to put it back into the target ?
 
Dagon,

I suspect it may well be quicker to load from a flat file. We have a similar problem where I work and we partitioned the database into 32 partitions. We than have 32 streams loading each partition running in parallel, meaning the elapsed time of the job is almost 30 times less.

Cheers
Greg
 
Hi,

INSERT INTO xxx SELECT * FROM yyy WHERE <<conditions>>>
always goes for logging of rows and this causes it to run slow.
As Greg mentioned above, a batch load with a LOG NO froma flat file will be much faster.

-PK.Ganapathy
DB2 DBA
 
I thought of that and tried copying the data into another table first. I then inserted the rows directly from that table into the real table using a simple &quot;select * from table&quot; statement. I got very similar performance, so I don't think the select is the problem.
 
Dagon,

it's also very important as regards speed of load, that you are loading in Clustering Index Order. Is that the case here?

Cheers
Greg
 
I definitely think you should go for loading from a flat file using the LOAD Utility and Greg's point about having the file in Clustering Index Order is important. When sorting your flat file, remember that some columns may be nullable. Such columns should be sorted by 1) the Null Indicator, 2) the column value. This ensures that null values are sorted correctly to the end.
Pat McEvoy
 
Actually, I think the not logged initially is not in effect in this case. I have found that they have to be issued in the same statement or it does not work(ie,logs).

I have found this to work. Run it is a shell script. Hope this helps.

. /udbhome/<instname>/sqllib/db2profile

DB2CMD1=&quot;alter table xyz.tab1 activate not logged initially&quot;
DB2CMD2=&quot;insert into xyz.tab1 select * from xyz.tab2 where col3='AB'&quot;
DB2CMD3=&quot;commit&quot;

db2 connect to <dbname>;

db2 +c -tv &quot;${DB2CMD1}&quot;; db2 &quot;${DB2CMD2}&quot;; db2 &quot;${DB2CMD3}&quot;;

db2 terminate;
 
Along the lines of what udbeeedba said,

The &quot;Not Logged Initially&quot; option gets reset after a commit. So if you'd like to continue doing your loading with individual insert statements, disable autocommits:

db2 update command options using c off
 
Which of these is likely to be quicker:

a) drop all indexes before the start of the load and then rebuild at the end

b) drop all indexes apart from the cluster index before the load, load in cluster index order and then rebuild the other indexes at the end

c) don't drop any indexes but load in cluster index order

 
My experience has been that (a) (drop,load,index) is fastest.

Since load is NOT a series of INSERTs, then its just righting pages of data at a time, without touching indexes. The index is then rebuilt at the end. And then you'll have to reorg in order to get your indexes clustered properly.

Thus, simply having no indexes, and building them once all the data in place is usually faster.
 
The reason you cluster data is for read performance. Are you always going to replace data? How do you intend to load data - with autoloads or inserts? If you choose autoloads, then remember that autoloads do not look for freespace - they start an extent with each run and load the data in. And if more then one splitter is specified (eee) then even if the input data is clustered there is no guarantee that the data is clustered in the database. So if data is deleted and autoloaded, even if you have a clustering index, data will really not be 'clustered' and you will have to reorg regularly.

It is faster to load without indexes. However, if data is appended and the table is big, rebuilding indexes from scratch may be quite time consuming - though the indexes are more efficient and better organised.

If data is deleted and inserted on a regular basis, then it is advantageous to use a clustering index(aside from the benefits for selects) along with pctfree. If pctfree is sized right then as much data will be added as deleted (with a safety margin) and reorgs at a certain frequency will keep both data and indexes clustered.

To sum it up, it depends.
 
My data is presently being reloaded from scratch each time.
 

Then I think you can autoload replace the data keeping indexes and PK on. I think it amounts to the same thing since either you build the indexes or autoload takes care of it in the build phase.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top