75 million row update???

m0nkey · Apr 17, 2003

Hi all,
I have a table with approx 75 million rows of names and addrersses in it that I am trtying to update...so far the update is running 5 hours and with no end in sight...a liitle background is that this is running on a quad zion 500 with 3 gb ram ands one 145 gb drive (boooo) without improving the hardware needs can i improve the performance...I have indexed all the where fields that i read on and only update the table but once or twice a month, but I do daily selects by zip or county (all indexed) i even have a composite key on phone and zip...

i have heard of horizontal partioning but i always thought that was reserved for archiving old transactional data that rarely gets read on....

i have been running this proc for the past 5 HOURS!!! that i got from this forum....any help is appreciated, since all i have is time at this point....
--Set rowcount to 100000 to limit number of updates
--performed in each batch to 100K rows.
Set rowcount 100000

--Declare variable for row count
Declare @rc int
Set @rc=100000

While @rc=100000
Begin

Begin Transaction

--Use tablockx and holdlock to obtain and hold
--an immediate exclusive table lock. This unusually
--speeds the update because only one lock is needed.
Update [2000] With (tablockx, holdlock)
set [source] = '2000'

--Get number of rows updated
--Process will continue until less than 10000
Select @rc=@@rowcount

--Commit the transaction
Commit
End

m0nkey · Apr 17, 2003

Just a note also....when i performed a trace there are plenty of reads but no writes...is this normal during an update...

thanks as always...

tlbroadbent · Apr 17, 2003

Your proc is updating the same rows every time it loops. You have no where clause that tells SQL which rows to update. The query will run forever if you let it.

Modify the Update as follows.

Update [2000] With (tablockx, holdlock)
Set [source] = '2000'
Where [source] <> '2000'

If you want to get the best answer for your question read faq183-874 and faq183-3179.
Terry L. Broadbent - DBA
SQL Server Page:

http://tlbroadbent.home.attbi.com/sql/sql_articles.htm

m0nkey · Apr 18, 2003

thanks for pointing out my newbie mistake!!!!

question though...I have a dialog going between two people and was wondering if you could weigh in on it....here is the suggestion to my above situation and my friends response...they truely conflict and was wondering mainly about the index dropping or not....thanks as always...

------START QUOTATION HERE------
Some suggestions to improve performance.

1.drop clustered indexes on the table and create only non clustered.

[André] While it’s true that a table doesn’t have to have a clustered index, it’s hugely beneficial to have one on every table. it’s the basis of a search. All other indexes use it for their searches.

2. Before update drop all the indexes on that table and update the table

[André] I could only recommend this if you were bulk inserting, rather than updating. The update will be considerably faster if it’s able to take advantage of an index.

3. during select --whereever you have condition you should force the index like
select * from table (index=col1IDX)where col1 >100 and col1<200
4. run these updates during off hours
5. Create those indexes on a different filegroups

[André] makes absolutely no difference unless you’re on sql7. SQL Server 2k is so much more efficient, it makes no difference if you have 1 or multiple filegroups. I did extensive research on this for our database. Our 120GB database is comprised of 1 database and 1 tran log file, located on the primary filegroup.

6. create few filegroups pointing to different disk drives, so that it can be striped. SQL creates one thread for each filegroup

[André] no. use 1 filegroup and use hardware RAID – much more efficient.

7.split into many transaction and do a checkpoint or take tranlog backup frequently

[André] this might help

8. Increase the tranlog to a considerable size before doing the update statement.

[André] at least 25% of the database size

9. When you do select statement on those tables use (nolock) statements

[André] helpful if you’re updating at the same time you’re trying to select. Just keep in mind that (nolock) can return ‘dirty’ results.

tlbroadbent · Apr 18, 2003

1.drop clustered indexes on the table and create only non clustered.

[tlb] I recommend clustered indexes for most tables. There may be rare instances where tables subject to many inserts and updates will have better performance without a clustered index.

2. Before update drop all the indexes on that table and update the table

[tlb] I would concur with André most of the time. I don't know if indexing will help much on your update as you have no criteria except the changed column. Dropping indexes on other columns may speed the update. However, indexes must be rebuilt after the update and you need to measure the cost.

3. during select --whereever you have condition you should force the index like
select * from table (index=col1IDX)where col1 >100 and col1<200

[tlb] Avoid index hints except in rare instances. I used hints more frequently in SQL Server 7 than SQL Server 2000. However, even then I used hints sparingly.
4. run these updates during off hours

[tlb] Agreed!

5. Create those indexes on a different filegroups

[tlb] RAID spreads the file over multiple disks. Creating filegroups on the same array will probably not improve performance. Placing indexes on a different filegroup on a different array should help performance. I don't have any statistical evidence to support that position but that seems to be the general opinion in most literature.

6. create few filegroups pointing to different disk drives, so that it can be striped. SQL creates one thread for each filegroup

[tlb] Definitely use hardware RAID.

7.split into many transaction and do a checkpoint or take tranlog backup frequently

[tlb] I've not found that checkpoint or tranlog backups are helpful. Updating in batches, as you are doing, with each batch enclosed in a transaction definitely helps.

8. Increase the tranlog to a considerable size before doing the update statement.

[tlb] Increasing the transaction log size is not as important if updating in batches. It is helpful to increase the size so SQL Server doesn't need to grow the file during processing.

9. When you do select statement on those tables use (nolock) statements

[tlb] If you use the TABLOCKX hint when updating the table, you won't be able select while the update runs. If concurrency is an issue then don't use TABLOCKX on the update. Using NOLOCK on Selects can help performance.

If you want to get the best answer for your question read faq183-874 and faq183-3179.
Terry L. Broadbent - DBA
SQL Server Page:

http://tlbroadbent.home.attbi.com/sql/sql_articles.htm

m0nkey · Apr 18, 2003

thanks for all the advice and I commend you on your help in my matter...i will post my results and improvments if that helps anyone in the same boat....

m0nkey

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

75 million row update???

m0nkey

MIS

m0nkey

MIS

tlbroadbent

MIS

m0nkey

MIS

tlbroadbent

MIS

m0nkey

MIS

Similar threads

Part and Inventory Search

Sponsor