Best "update" strategy?

snitin78 · Aug 11, 2009

Hey everyone,

I have a table (TABLE A) that has close to 9 million records. I am trying to add a new column to this table and then have this new column updated with values from another table (TABLE B).

I have tried two approaches so far:

1. Cursor Approach: Table A already has a reference to a column in Table B. So, I get a distinct list of values from the existing column in Table A, create a cursor and then update the new column where value in Table A matches the cursor value. Something like:

Code:

declare @table_A_column_value varchar(55),
        @new_column_value int

declare cursor @cursor

set @cursor = cursor for select distinct tableA_oldcolumn from tableA

open @cursor
fetch next from @cursor into @table_A_column_value
while(@@fetch_status = 0)
begin
     select @new_column_value = tableB_column from tableB where tableB_someColumn = @table_A_column_value

     update tableA set new_column = @new_column_value where
     tableA_oldcolumn = @table_A_column_value

     fetch next from @cursor into @table_A_column_value
end

With this approach I can get some debugging output (print statement), for each set of data, something like how many rows are required to be updated, how many were updated, break/rollback if the numbers don't match etc. But the problem here is that Table A has 300,000 distinct values for the old column. As a result, the last time I ran this update, it ran for over 11 hours!!

2. Update-join approach: The second thing I tried was to do an update-from-join, like this:

Code:

   update tableA set new_column = tableB.somecolumn
    from tableA 
    inner join
    tableB on tableA.oldcolumn = tableB.tableB_someColumn

With this approach, all the records are updated at once and I do not have control over the updates being done. This also takes 2 hours!

My questions are:

1. Is there any other better/optimum way to update tables with such huge amount of data?
2. Both these queries are hogging the entire DB (the second one to such an extent that I had to kill it and restart the server!). Is there a way to handle that?

Any help is highly appreciated.

gmmastros · Aug 11, 2009

9 million rows sounds like a lot, but it really isn't.

I suspect you have an index problem. When you join the 2 tables, indexes will help a lot. I mean.... "oh my god it's fast" type of difference.

In your example, there should be an index on tablea.oldcolumn and an index on tableb.tableb_somecolumn.

Since you are updating the new_column in tableA, you are better off NOT having an index on that column. If you want that one indexed (for other queries later), then you should create the index after populating the data.

to see what the existing indexes are:

sp_helpindex 'TableA'
sp_helpindex 'TableB'

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

snitin78 · Aug 11, 2009

Thanks for your reply, George!

Apparently indexes (on the columns in question) are present on both the tables. TableB has just 1 index on tableb_somecolumn. TableA on the other hand has a clustered index and another non-clusterd both containing oldcolumn. i.e. index1 = CLUSTERED (tableA.someColumn1, tableA.oldcolumn), index2 = NON-CLUSTERED (tableA.oldcolumn, tableA.someColumn2).

Execution plan for the update-from-join approach shows that index1 is being chosen for 'update' and index2 for the join with tableB. Do you think that the order of the column in the indexes would make a difference?

Also, I tried to provide query hints to use the second index but the update seems to always take the clustered index!

gmmastros · Aug 11, 2009

It's hard to say. Have you tried the "Database Engine Tuning Advisor"? Load the query in a query window (without executing it). Click Query -> "Analyze Query in database engine tuning advisor".

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

snitin78 · Aug 11, 2009

I'll try it. Thanks again!

hmckillop · Aug 12, 2009

One other point, if you have a look at the "batch" faq, it will give you another way to process the second version of your update but with more visibility on how well its running.

http://www.tek-tips.com/faqs.cfm?fid=3141

"I'm living so far beyond my income that we may almost be said to be living apart

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Best "update" strategy?

snitin78

Programmer

gmmastros

Programmer

snitin78

Programmer

gmmastros

Programmer

snitin78

Programmer

hmckillop

Programmer

Similar threads

Part and Inventory Search

Sponsor

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Best &quot;update&quot; strategy?

snitin78

Programmer

gmmastros

Programmer

snitin78

Programmer

gmmastros

Programmer

snitin78

Programmer

hmckillop

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor

Best "update" strategy?