Large SQL updates

aj2taylo · Oct 16, 2003

Hello,

I am trying to build a relationship between 2 tables by trying to match up records between them, grab a value from table B, and then insert it into table A for the corresponding record.

The problem is the tables are very large (table A, which I'm updating, has 400,000+ records), and I have to be careful not to run a process that will hog the system.

The update command for individual records is taking several seconds. Is there a good method I can use to look up values and update the tables without clogging the system?

Thanks

siberian · Oct 16, 2003

Are you binding your sql statements? You should get a nice boost out of that.

Other then that you are moving into database optimization. Are your tables properly indexed? I'd have to think so since the are so large.

Another way to handle it is to run your relation script on another server entirely and put 1 second between each command. Turn it on and let it run all night. This will keep the db from flipping out.

MikeLacey · Oct 17, 2003

aj,

You have two tables, you work through the first row by row and update (or insert) a row each time in the second table. Is that right?

The relationship between the tables, one to one? one to many?, one to none/one/many?

Which database is this?

Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.

aj2taylo · Oct 17, 2003

Hi there,

Thanks for the help so far guys, it's appreciated.

siberian: What do you mean by "binding" and "properly indexed" (unique keys?). The suggestion about running it from a seperate server is a good one, I think I'll do so.

MikeLacey: Yeah, I'm going through row-by-row...essentially, checking a for a row in table A where columnZ has a null value, looking up the value in table B, and updating the record in table A. I believe hte relationship is one-to-one/none - I've inherited the problem and still trying to learn what all the data means

It's a mysql database.

Thanks again guys

siberian · Oct 17, 2003

In DBI you can bind variables. This means you only have to prepare the statement once which can result in a huge execution time boost.

my $sth = $dbh->prepare( 'INSERT into users (name, phone, email) values (?, ?, ?)' ) ;

foreach $THING( @THINGS ){
$sth->execute( $THING->{ name }, $THING->{ phone } , $THING->{ email } );
}

This only does the prepare overhead once and essentially reexecutes the same query over and over again with your new values .VERY efficient. Trust me, do it. Obviously replace $THING with whatever data you are trying to get into the sql.

As far as indexing, if the source database tables are properly indexed for the query you are doing it should go more quickly.

For example, if you are searching on the column 'ID' you could index it. This means that when you match against the 'ID' column the database has an optimized lookup table and does not have to go record by record. This is standard database stuff. You should make sure your query parameters are properly indexed in the db.

tviman · Oct 17, 2003

I'm going through row-by-row...essentially, checking a for a row in table A where columnZ has a null value, looking up the value in table B, and updating the record in table A.

You might try something like this to keep from hogging the CPU:

1. Create a temporary table from table A where columnZ is null.
2. Then update the temp table from values in table B
3. Update table A from the temp table.

This frees table A by creating a smaller subset of the table. Each record of the temp table only needs enough fields to uniqely identify it to table A and table B so it could potentially be an extremely small file. Updating this file will be a lot faster than going thru 400,000 records. Consequently, the table A update will also be very fast.

There's always a better way. The fun is trying to find it!

MikeLacey · Oct 17, 2003

aj,

Siberian's comments about binding are well made.

prepare() the SQL statements before hand using placeholders for values in WHERE clauses.

And the index -- if you're going to run this update more than once, even just running it twice, then it's worth making sure that there's a good index on the first table. "Good" in this context means an index on just columnz so that the statement

SELECT * FROM table1 WHERE columnz IS NULL;

Can run nice and quickly.

The other index you need is on the unique key in table2, so if your update statment looks like this:

UPDATE table2 SET flda = 1234 WHERE fld0 = 999;

Then there needs to be an index on fld0, but if it looks like this:

UPDATE table2 SET flda = 1234
WHERE fld0 = 999 AND fld2 = 2;

Then the index needs to be on fld0 and fld1.

With respect to tviman's comments (sorry tviman, seems I'm always disagreeing with you) the temporary table approach will only be quicker when the update run will be executed only once. An exception to this might be if the database tended to take table locks when performing scans of large tables.

Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Large SQL updates

aj2taylo

Programmer

siberian

Programmer

MikeLacey

MIS

aj2taylo

Programmer

siberian

Programmer

tviman

Programmer

MikeLacey

MIS

Similar threads

Part and Inventory Search

Sponsor