Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Incremental Commit on Large Insert

Status
Not open for further replies.

phobia42

MIS
May 1, 2008
2
US
Hey guys, hope you can help me with this. My team and I are trying to insert about 150 million records from a query using 4 tables into 1 table. Due to hardware limitations, we need to commit ever n number of records. The best solution we have found so far is a loop that has an iteration counter, and commits when the counter is divisible by a certain number. However, that involves inserting rows 1 at a time, and slows down the insert dramatically. Anyone have any other suggestions to accomplish this? Thanks for your help.
 
I would try using the SQL*Plus copy command. Then you can set the parameters ARRAYSIZE and COPYCOMMIT to adjust the frequency of commits. ARRAYSIZE determines the size of a batch, and the value of COPYCOMMIT determines how many batches are processed between commits.
 
Phobia,

Unfortunately, one of Oracle's weaknesses is stepping through a result set in pieces.

The model you can use to avoid processing an INSERT for each row is to do a single INSERT by SELECT-ing the next "n" rows (where "n" is your limit for rows to process during a single COMMIT cycle).

The problem here, of course, is to INSERT the NEXT "n" rows, the source-query result set includes (but does not INSERT) the previous rows that you have already INSERTed.

To illustrate, let's presume that your COMMIT limit is 10,000 rows per INSERT. This is how the INSERT logic would work:
Code:
INSERT INTO <new table>
SELECT <column-list>
  FROM (SELECT rownum rn, x.*
          FROM (SELECT <your 4-table join here>
                 ORDER BY <columns>) x)
 WHERE x.rn between <y> AND <y+10000>;
COMMIT;
...where "y" is the number of the earliest row that has not yet been inserted.

Although you will do a single INSERT and COMMIT per 10,000 rows, you will be SELECT-ing this many total rows during the total INSERT process:
[tt]
INSERT 1: SELECT rows 1-10,000, INSERT rows 1-10,000
INSERT 2: SELECT rows 1-20,000, INSERT rows 10,001-20,000
INSERT 3: SELECT rows 1-30,000, INSERT rows 20,001-30,000
...
INSERT last: SELECT rows 1-150M , INSERT rows 149,990,000-150,000,000[/tt]

So, if you INSERT 150M rows in 10K-row chunks, you will be actually reading/SELECTing 1,125,075,000,000 rows (1 trillion, 125 billion, 75 million rows).

I don't know for sure how the CPU cycles for SELECTing 1.1 Trillion rows + 15 Thousand INSERTs compares to 150 Million INSERTs, but I'll bet there is not much savings (if any) of doing the INSERTs in chuncks versus 1 INSERT per row.

Let us know your thoughts.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
I was composing my response while Karl was formulating his suggestion. I did not consider using the methodology he suggests. It certainly looks/sounds/feels good, so please try it on a test bed, Phobia, and let us know the results.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
phobia,

how many rows exist in the table before you do the insert?
Depending on your response, I may have a faster method of achieving what you want.

Regards

T
 
I appreciate the suggestions. We considered a solution where we insert using rownum as a limiter in the selection criteria, but our big concern is the time involved in querying each time. A select * from <table> on that table takes a little while, like 20-30 seconds, due to its sheer size, so performing that selecting thousands of times would likely slow down the query immensely. Currently, using a cursor with a row-by-row insert, it breaks down to about 15 seconds per 100,000 rows. When we don't use the cursor, the entire insert of all 150M rows takes about 4-5 minutes. We can afford to increase that to as much as 30-60 minutes, but anything more than that becomes an unworkable solution. As for karluk's suggestion, We are very limited in the environment we have to work in. Our development platform is TOAD, and we are not aware of what resources the DBA has to work with. We are running a large procedure at the end of each month, and are unaware of the server capabilities. However, I will definitely do some research on the copy command and see if it would work for what we need. Finally, to address tharg's question, the table we are inserting into is actually empty. We want to execute a query that joins 4 tables and restricts the dataset based on predetermined criteria, and store the results to a table. Thanks again for all the help.
 
As for the first advice by karluk, sql*plus is the most generic Oracle utility, so your DBA should be familiar with it.

And after all why do you need those intermediate commits? Rollback segments/snapshot too old issues? If so, this is completely another story :)

Regards, Dima
 
Sem said:
why do you need those intermediate commits? Rollback segments...?
Sem is absolutely correct...On INSERTs, the rollback entries are negligible since rolling back from an INSERT results in nothing. (The high-cost entries in rollback segments are DELETEs, since the rollback image for that work requires an entire image of the old data.)

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
phobia,

I suggest that you attempt the following.

Code:
DROP TABLE PHOBIA CASCADE CONSTRAINTS;
CREATE TABLE PHOBIA AS SELECT blah blah from four tables

A CTAS (Create Table As Select) is usually the fastest way to do this.

Regards

T
 
Tharg said:
A CTAS (Create Table As Select) is usually the fastest way to do this.
I recommend, however, that prior to executing the CTAS with actual data, execute it the first time with a WHERE clause that reads:
Code:
CREATE TABLE PHOBIA AS SELECT blah blah from four tables
[b]WHERE 1=2[/b];

(followed by)
DESCRIBE PHOBIA
This lets you confirm the structure of the table (ensuring that it meets your expectations) prior to (the possible very long) execution of the actual table create.

Once you have confirmed the quality of the test, DROP TABLE PHOBIAS; followed by a re-execution of the CTAS minus the WHERE statement.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA)
[I provide low-cost, remote Database Administration services: www.dasages.com]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top