Any suggestions for a large update? 2

commatom · Nov 27, 2007

I have to update a partitioned table with about 80 million records.

We have added a new column which has about 1250 different possible values.

The value is derived by matching a substring of another field to a table. The substring can be either 1,2,3 or 4 characters in length so I have created a function to find the value. The function works great when adding to the table but is too slow for the update.

I decided to create 4 temporary global function based indexes and update each value on the table 1250 times. I can see that the index is being used but this is also taking too long.

Should I have created the indexes locally? I would think since I am updating across partitions it should be global.

Perhaps I should have done one partition at a time but that seems repetitive.

Maybe create a loop that updates by rowid? But then I would think that 50 updates would be slower than 1250 updates on an indexed column.

It almost seems it would be faster to rename the table and re-insert the data using the function.

Any suggestions?

Additional Info:
The table has 38 columns
The column I am updated has no indexes on it
The table has no constraints on it
The table has 6 partitions by year.
The smallest partition has 500,000 rows
The largest partition has 15,000,000 rows

What would you do next?

Thank in advance! I have also pasted the function code below for reference.

Code:

CREATE OR REPLACE FUNCTION DW.f_e164 (v_call varchar)
 RETURN  NUMBER IS
v_COUNT   NUMBER;
v_1 varchar2(1);
v_2 varchar2(2);
v_3 varchar2(3);
v_4 varchar2(4);
v_e164code varchar2(10);
v_short_code varchar2(30);

CURSOR c_short IS
      SELECT longnumber
      FROM  dw.all_short_codes
      WHERE  shortcode = v_call;
     
CURSOR c_164 IS
      SELECT e164cd  
      FROM dw.te164mapping
      WHERE e164cd IN (v_1, v_2, v_3, v_4);

--===========================================================
-- This function finds the E164 codes for called numbers   ==
-- There are 4 different ways to return a value            ==
--  1) Match the first X digits to the E164 LU table       ==
--  2) 2STG calls will default to '111111'                 ==
--  3) Match short code calls to the short code table and  ==
--     and replace the called number from the short code   ==
--     table and lookup the E164 code based on the long    ==
--     number                                              ==
--  4) Everything else is defaulted to 0 which is a misdial==
--===========================================================
   BEGIN
--===========================================================
-- Determine if it is a 2stg call                          ==
--===========================================================

IF v_call = '2STGMTC'
     
   THEN RETURN ('111111');

ELSE 
--===========================================================
-- Determine if a short code was used                      ==
--===========================================================
       
     OPEN  c_short; 
     FETCH c_short INTO v_short_code;

--===========================================================
-- If no short code was used, look up the E164 on the long ==
-- number. Otherwise, use the called number to find code   ==
--===========================================================
           
              IF c_short%NOTFOUND
                  THEN 
                       v_1 := substr(v_call,1, 1);
                       v_2 := substr(v_call,1, 2);
                       v_3 := substr(v_call,1, 3);
                       v_4 := substr(v_call,1, 4);
                  ELSE
                       v_1 := substr(v_short_code,1, 1);
                       v_2 := substr(v_short_code,1, 2);
                       v_3 := substr(v_short_code,1, 3);
                       v_4 := substr(v_short_code,1, 4);
                  END IF;  
           
     CLOSE c_short;
      
--===========================================================
-- Look up the E164 Code, if no code is found return '0'   ==
--===========================================================       
    
     OPEN  c_164; 
     FETCH c_164 INTO v_e164code ;

          IF c_164%NOTFOUND
          THEN RETURN ('0');
          ELSE
          RETURN (v_e164code);
          END IF;
END IF;


   END;

Dagon · Nov 27, 2007

The problem is likely to be the selects you are doing against dw.all_short_codes and dw.te164mapping. Bear in mind that you are doing these for millions of rows, so even if each one takes just 0.1 of a second, the result would be hours of processing.

Add indexes on the lookup columns if they are not already indexed. If there isn't much data in the tables, consider caching them in memory. You could do this by means of a PL/SQL table indexed on the lookup column e.g. shortcode in the case of all_short_codes.

You would need to convert the function to a package and populate the PL/SQL table once only in the package initialisation.

TYPE t_codes is table of all_short_codes.longnumber%type index by all_short_codes.shortcode%TYPE;
t_codes v_codes;

create or replace package body

-- your function

begin
for r_short in c_short loop
v_codes(r_short.shortcode) := r_short.longnumber;
end loop;
end;

To look up the value, you can then just use:

if v_codes.exists(v_call) then
...
end if;

commatom · Nov 27, 2007

This is exactly why I am NOT using the function to backfill the data. Like I said, the function works for hundreds of thousands of rows in seconds via insert.

Right now I running one update per code so thats 1243 updates. When there are only a few hundred to update it is about a second but when I get into the 10 thousand plus update, it seems to take way to long. The whole script has been running 22 hours.

At this point I know it would be faster to update row by row by using a cursor and updating by rowid. I could get that done in about 8 hours.

Neither are acceptable for my production move. <sigh>

As for your suggestion, Im a little confused as to what you are doing. Wouldn't I be only updating one row at a time using your suggestion?

Dagon · Nov 27, 2007

I wasn't very sure from your post what the strategy was for the bulk update. I assumed you were trying to use your function, so that was my suggestion for speeding it up. Obviously, you could write a specialized bulk update procedure.

Thargy · Nov 27, 2007

commatom,

commatom said:
We have added a new column which has about 1250 different possible values.

Since the function always returns the same output for a given input, and has a repertoire of 1250 different responses it is, by your stipulation deterministic.

Why not declare the function as such, and pin it in memory. The first time it evaluates inputs it will perform the calculation, the second, Oracle will immediately return the answer with zero calculation. Surely that ought to make a significant difference to the speed.

If it stays pinned, eventually it will have evaluated each of the possible responses, and will do zero calculation thenceforth.

Can you try this approach out and let me know if it does any good.

Regards

T

Grinding away at things Oracular

Thargy · Nov 27, 2007

commatom,

try this for size

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6407993912330

This thread discusses similar issues to yours, and has a wealth of useful info. I am merely riding on Tom's back here, but I hope it helps. Look into parallel operations to save time - make the CPU sweat, not you!

Regards

T

Grinding away at things Oracular

commatom · Nov 28, 2007

Thank you for your suggestions, I have tried several different methods and the following are the results.

Method 1: Update the table using the function. i.e. update stg_oke_cdr_detail set e164 = dw.f_e164(callednumber);

This ran for 10 hours and was no where near completion so this method was immediately scratched.

Method 2: Update the table using function based indexing on the substring of the callednumber and update the table once for each possible e164 code

This ran great for codes that didn’t have many to update. When it came to update more than 100,000 records the process slowed to a crawl. I killed this run after 26 hours. It was about 80% complete at that point.
I estimate it would have taken about 32 hours to complete

Method 3: PIN the function in memory and update each partition one at a time.

This method was a significant improvement over the prior two. I killed it after a few hours but could tell it would take about 18 hours to complete.

Method 4: PIN the function in memory and update all partitions at once and force parallelism.

This method again was a huge improvement and the parallelism worked properly but was hard to monitor. I let it run to completion last night and it took 7 hours and 35 minutes.

Method 5: Rebuild the table completely using and insert statement and parallel and append hints.

This completed in 4 hours and 12 minutes and is clearly our best solution. Once the table was rebuilt, I had to rename the table and rebuild the indexes. This will be the method used for production.

I appreciate the link tharg....star for you!

Thargy · Nov 28, 2007

tom,

when you say rebuild the table using inserts, have you considered the old faithful of CTAS (create table as select)?

I believe that you can use the function as part of the select statement and blast out a table really quickly. Do you have the resources available to give it a whirl? I have seen enormous performance improvements with CTAS in the past.

T

Grinding away at things Oracular

commatom · Nov 28, 2007

I could do CTAS but would that create my partitions?

commatom · Nov 28, 2007

Forget that last question, I figured out how to write it and specify the storage and partitions.

Thargy · Nov 28, 2007

tom,

since I've never had the chance to do a CTAS on a partitioned table, would you mind posting your script?

Also, was it successful and how long did the CTAS take, compared with the other methods?

Regards

T

Grinding away at things Oracular

commatom · Nov 29, 2007

Here is my code. It bombed out because it took up all the temp tablespace. Our temp is 20 gb.... doh!

create table dw.stg_oke_cdr_hold
TABLESPACE USERS
PCTUSED 0
PCTFREE 10
INITRANS 1
MAXTRANS 255
LOGGING
PARTITION BY RANGE (MONTHID)
(
PARTITION STG_MONTHID_2002 VALUES LESS THAN (200301)
LOGGING
NOCOMPRESS
TABLESPACE STGPPUSAGE02
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
BUFFER_POOL DEFAULT
),
PARTITION STG_MONTHID_2003 VALUES LESS THAN (200401)
LOGGING
NOCOMPRESS
TABLESPACE STGPPUSAGE03
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
BUFFER_POOL DEFAULT
),
PARTITION STG_MONTHID_2004 VALUES LESS THAN (200501)
LOGGING
NOCOMPRESS
TABLESPACE STGPPUSAGE04
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
BUFFER_POOL DEFAULT
),
PARTITION STG_MONTHID_2005 VALUES LESS THAN (200601)
LOGGING
NOCOMPRESS
TABLESPACE STGPPUSAGE05
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
BUFFER_POOL DEFAULT
),
PARTITION STG_MONTHID_2006 VALUES LESS THAN (200701)
NOLOGGING
NOCOMPRESS
TABLESPACE STGPPUSAGE06
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
BUFFER_POOL DEFAULT
),
PARTITION STG_MONTHID_2007 VALUES LESS THAN (200801)
LOGGING
NOCOMPRESS
TABLESPACE STGPPUSAGE07
PCTFREE 10
INITRANS 1
MAXTRANS 255
STORAGE (
INITIAL 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
BUFFER_POOL DEFAULT
)
)
NOCOMPRESS
NOCACHE
MONITORING
parallel (degree 12)
as
SELECT /*+ parallel (stg_oke_cdr_detail,12) */
STGOKECDRDETAILID ,
STGERGCDRDETAILID ,
FILENAME ,
CALLPROCID ,
RECORDNUMBER ,
CUSTOMERID ,
CONTRACTID ,
MSISDN ,
PIN ,
MONTHID ,
CALLSTARTDATE ,
CALLSTARTDATETIME ,
CALLENDDATETIME ,
SERVICECAT ,
SERVICEPLAN ,
CHARGETYPE ,
CALLSTATE ,
CALLTYPE ,
BILLABLEFLAG ,
LAC ,
CALLEDNUMBER ,
TERMSTATE ,
DIAGNOSTICCODE ,
OLDBALANCE ,
NEWBALANCE ,
UNITS ,
ACTUALSECS ,
ROUNDEDSECS ,
INSERTTS ,
UPDATETS ,
UPDATEUSERID ,
LEAKAGEUNITS ,
DUPFLAG ,
FREE_FLAG ,
BSCSCUSTOMERACCTID ,
PPUSAGECATEGORYCD ,
VOUCHERCLASSID ,
DW.f_e164(callednumber) e164
FROM dw.stg_oke_cdr_detail;

Thargy · Nov 29, 2007

thanks tom.

It's what I thought it would look like, but there's nothing quite like a "known good" script. When you get around to increasing the tablespace to lots 'n lots of bytes, can you let me know how long it takes to complete.

T

Grinding away at things Oracular

commatom · Nov 29, 2007

Well, you know what I was thinking about trying to do today is
to create one table for each partition and then swap the partitions.

I would have to re-create the indexes twice but at least I could do that in parallel.

I'll let you know how that goes.

commatom · Nov 29, 2007

Success!

I rewrote the script and did CTAS for each partition and then swapped the partitions.

It took 2 hours and 15 minutes!!!

Of course now I have to regather all my stats but that won't take me too long.

This turned out to be quite a learning experience.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Any suggestions for a large update? 2

commatom

Programmer

Dagon

MIS

commatom

Programmer

Dagon

MIS

Thargy

Technical User

Thargy

Technical User

commatom

Programmer

Thargy

Technical User

commatom

Programmer

commatom

Programmer

Thargy

Technical User

commatom

Programmer

Thargy

Technical User

commatom

Programmer

commatom

Programmer

Similar threads

Part and Inventory Search

Sponsor