Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Large Database Updates 3

Status
Not open for further replies.

vegasron

IS-IT--Management
Aug 18, 2008
5
US
I am processing massive updates to tables - usually between 150 million and 200 million rows updating. The queries can grind for 30-48 hours before finishing.

I want to reduce processing time - will LOOP statements to select sub-sets of the data assist in performance? The first table has 168 million rows, the second over 550 million rows. Here is a sample of the code:

USE db_owner.Database_Build_Unique_Email;
GO
DECLARE @TMin int;
DECLARE @TMax int;
DECLARE @Rmin int;
DECLARE @Rmax int;

SELECT @Tmin = MIN(TEMP_PK)
FROM db_owner.Database_Build_Unique_Email;

-- SELECT @TMax = MAX(TEMP_PK)
-- FROM db_owner.Database_Build_Unique_Email;

SET @Tmax = 100000

Set @Rmin = @TMin;
Set @Rmax = @Rmin + 1000;

WHILE (@Rmax < @TMax + 1000)
BEGIN;
UPDATE db_owner.Database_Build_Unique_Email
SET url_id = db_owner.aroi_campaign.url_id, url = db_owner.aroi_sources.source_url, ip = db_owner.aroi_campaign.source_ip, lead_date = db_owner.aroi_campaign.source_date, campaign_disabled = '0'
FROM db_owner.Database_Build_Unique_Email INNER JOIN db_owner.aroi_campaign ON db_owner.Database_Build_Unique_Email.cust_id = db_owner.aroi_campaign.cust_id INNER JOIN db_owner.aroi_sources ON db_owner.aroi_campaign.url_id = db_owner.aroi_sources.url_id
WHERE (db_owner.aroi_campaign.Disabled = 0) AND (db_owner.Database_Build_Unique_Email.lead_date IS NULL) AND (db_owner.Database_Build_Unique_Email between @Rmin AND @Rmax);
PRINT 'Records '+@Rmin+' of '+@Rmax+' have been updated. '+@Tmax-@Rmax+' records to go.';
SET @Rmin = @Rmin + 1000;
SET @Rmax = @Rmin + 1000;
END;
GO

Any suggestions?
 
My understanding of SQL databases say, keep it as a full set. Loops tend to slow everything down greatly. One thing you can do is try to stay away from string compares. Try to keep your WHERE clauses matching numeric values when ever possible. Also, check your indexes. Make sure you have your indexes lined up properly and that you don't have too many indexes. Then only other suggestion i can make is to add another processor to the server. Updating that many records is going to take some time.

"If I were to wake up with my head sewn to the carpet, I wouldn't be more surprised than I am right now.
 
There are several concerns when running these large updates - first is the maximum size of the log files I can create is 220GB, and on the last pass of this query, it grew to 210GB before I started the rollback. It took almost 45 hours to get to that point.

By using a LOOP statement - it will reduce the amount of records selected, leaving the tempdb smaller and the log files smaller.

Since the LOOP will also be selecting a much smaller set of data (somewhere between 1,000 records and 1,000,000 vs. 168 million), shouldn't it run faster?

When I have manually setup the queries to run using typed in start/stop (limit of 1 million records), it runs in 10-15 minutes, but I have to repeat the steps 168 times to get the final results.

I have thought of using views to get this to execute quicker also. Any feedback would be appreciated.
 
I think your set statements need a close inspection, but I would say that doing what you are trying to do is likely to speed up the overall processing.

It is true that carrying out updates in a single statement is generally the the best option, but only if the machine is up to the job.

If the update is too big then performance will degrade.

You will have to experiment with the size of your update.

It would probably also help if each update was within its own transaction, if that's possible.

I'll be interested to know how you get on.
 
Batching it into chunks has some advantages. Less impact on temp DB, less rollback is something fails, and much less blocking.

The last massive table update (hundreds of millions of rows on SQL Server 2000) we had to perform, our senior DBA wanted us to do it in batches of only 200 with a while @@rowcount =200 logic. I pushed hard to do it in one straight update. We did it my way and let me tell you, it sucked bigtime when we had to kill it due to poor performance and then we had to wait around for hours while it rolled back. Ouch.

 
I updated the query to the following:

USE AROI

DECLARE @TMin int; --sets the first record of the database
DECLARE @TMax int; --sets the last record of the database
DECLARE @Rmin int; --sets the first record of the loop
DECLARE @Rmax int; --sets the last record of the loop
-- Get the first record from the database

SELECT @Tmin = MIN(TEMP_PK)
FROM db_owner.Database_Build_Unique_Email;

-- Get the last record from the database
SELECT @TMax = MAX(TEMP_PK)
FROM db_owner.Database_Build_Unique_Email;

Set @Rmin = @TMin; --sets minimum value for the loop based on the first record of the database
Set @Rmax = @Rmin + 1000000; --sets the maximum value for the loop based on the first record in the loop plus step total

WHILE (@Rmax < @TMax + 1000000) --establishes the loop through the entire database.

BEGIN;
BEGIN;
-- Edit this for each update proc.
UPDATE db_owner.Database_Build_Unique_Email
SET db_owner.Database_Build_Unique_Email.cust_id = db_owner.aroi_identity.cust_id
FROM db_owner.aroi_address INNER JOIN
db_owner.aroi_identity ON db_owner.aroi_address.cust_id = db_owner.aroi_identity.cust_id INNER JOIN
db_owner.Database_Build_Unique_Email ON db_owner.aroi_identity.email_address = db_owner.Database_Build_Unique_Email.email
WHERE (db_owner.Database_Build_Unique_Email.cust_id IS NULL) AND (db_owner.Database_Build_Unique_Email.temp_pk between @Rmin AND @Rmax);
-- End Editable Section
END;
PRINT 'The Group ' + cast(@Rmin as char(10)) + ' of ' + cast(@RMax as char(10)) + ' have been updated. Total Records are ' + cast(@Tmax as char(10)); --shows status to the screen of the update procedure.
SET @Rmin = @Rmin + 1000000; --increments the Rmin counter for the next loop
SET @Rmax = @Rmin + 1000000; --increments the Rmax counter for the next loop
END;

The only problem I am having is the message screen only updates about once an hour - does anyone know how to speed the refresh rate of the message window?
 
I appreciate all the help provided above - the end result is we are processing 5,000,000 records per loop, and we are getting 90 million updates per hour. The query literally takes about 2 hours - before it took more than 40 hours.

The print command still puzzles me - at 1,000,000 in the batch, updates to the messages screen were approximately once per hour, at 5,000,000 per batch, it is at 40 minutes per update.

The good news is the log files are staying below 40GB (before they grew to over 200GB), execution is about 2 hours vs. 40+ hours, and I am getting the updates done quickly.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top