Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need to restore 200 millions rows from 400-million-row archive table, fast!

Status
Not open for further replies.

TheBugSlayer

Programmer
Sep 22, 2002
887
0
0
US
The audit tableon a SQL Server 2008 contains more than 400 million records. The table is heavily used and has become a drag on the database. Over the week-end it was renamed and a brand new audit table was created.

Now we need to copy a year’s worth of data back to the live audit table, the one that was recreated. However, there is not one single column that uniquely identifies rows in the audit table. I need to find a way to add a year’s worth of records back to the live audit table while keeping track of the records that have already been added back to avoid duplication. There is about 200 million records to add back, we cannot add them all at once but rather a certain number at a time so as not to tax the database too much.

I am thinking of duplicating the whole archive audit table, the one with the 400 million rows and adding a new Boolean column to it. That column will indicate whether the row has already been added over to the live audit table. The rows to be added back will come from the result set of a query that filters by the transaction date, returning only a year’s worth of data.

If anyone has a different idea, a more efficient one I would like to hear about it. I will be tinkering with my idea in the meantime…

Thank you.


MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA)
 
You could do it by date, say one week at a time.

djj
The Lord is my shepherd (Psalm 23) - I need someone to lead me!
 
True djj55. That's a great idea. The challenge is to keep track of which records have already been copied back. If the process fails in the middle of processing a week, when I resume I only need to copy over the rows withing that week that have not been copied yet...As I type that I am thinking, process the week in a transaction so that either all rows are copied or not, and insert the successful week value in a table to keep track...What do you think?

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA)
 
Ah, but then the same week number may occur in more than one year...I guess I could save first and last day of the week. Start from 01-01-2012 and do +7...

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA)
 
Absolutely. You should use transactions.

Doing the copy a week at a time will almost certainly make the overall time faster. I would also suggest that you make sure instant file initialization is enabled on your server.

When SQL Server needs to grow a database, the process can be a little slow. Since you need to copy so many rows, your database will certainly get larger.

Without instant file initialization, SQL Server will write a bunch of zero's to the hard drive. With instant file initialization enabled, SQL Server will not do this, it will simply allocate the space and make it available to you.

Instant file initialization only applied to mdf files (not log files). By copying a week at a time within a transaction, your transaction log file should not grow too much if you are using simple logging.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks George! I did not know about the instant file initialization feature. The database uses the full recovery model...I doubt I'll be able to change the recovery model in the middle of production as it requires restarting the service. I am still copying the archive audit table to a dev database to test my solution...52/400 millions so far...it's been about almost two hours.

This is what my script looks like so far:
Code:
DECLARE @FirstDayOfWeek AS CHAR(10) --First date of audit rows to copy over to live from archive.
DECLARE @LastDayOfWeek AS CHAR(10)  --First day plus 6, inclusive.
DECLARE @StopDate AS CHAR(10)       --Last date to copy from the archive table to live audit.

SELECT @FirstDayOfWeek = '2012-10-01',
       @LastDayOfWeek = CONVERT(char(10),dateadd(day, 6, @FirstDayOfWeek), 121),
       @StopDate = '2013-10-03'

WHILE (@FirstDayOfWeek <= @StopDate)
BEGIN	    
	/*Last day of the week is first day + 6, inclusive if the range is withing a week,
	  otherwise it's the stop date.
	*/
    IF DATEDIFF(dd, @FirstDayOfWeek, @StopDate) <= 6 
		SELECT @LastDayOfWeek = @StopDate
    ELSE 
		SELECT @LastDayOfWeek = CONVERT(char(10),dateadd(day, 6, @FirstDayOfWeek), 121)
    
    BEGIN TRY
		BEGIN TRAN T1
		
		--Insert into production audit table
		INSERT INTO AUDITTABLE
		SELECT * FROM [AUDITTABLE-Archive]
		WHERE TRANSACTION_TIME BETWEEN @FirstDayOfWeek AND @LastDayOfWeek --AND IsCopied = 0
		--Update archive audit records transferred over
		--UPDATE [AUDITTABLE-Archive] SET IsCopied = 1 WHERE TRANSACTION_TIME BETWEEN @FirstDayOfWeek AND @LastDayOfWeek --AND IsCopied = 0
		
		--Wait 3 minutes. *Seems too long.
		WAITFOR DELAY '00:03:00'
		--SELECT @FirstDayOfWeek, @LastDayOfWeek, DATEDIFF(dd, @FirstDayOfWeek, @StopDate)
		--The next week starts at last day of current week + 1.   
		SELECT @FirstDayOfWeek = CONVERT(char(10),dateadd(day, 1, @LastDayOfWeek), 121)
		
		COMMIT TRAN
	END TRY
	BEGIN CATCH
		IF @@TRANCOUNT > 0 ROLLBACK TRAN T1
	END CATCH	
END

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA)
 
for those volumes I would consider setting up the main destination table partitioned, load each individual week onto its own partition in parallel and then, if required, merge all the partitions into a single table.

the following article has info not only how to setup partitioning, but also on how to change the final table back to un-partitioned if that is required at the end.


Regardless... having the new field is copied and doing an update of it is really not required - you can easily keep an audit of which weeks have been copied on another small purpose build table to control that.
That update is definitely going to kill your performance if done as you are doing.

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Thanks Frederico. If only the people I am dealing with would actually listen the advice of those more verse in the technology...The update on the same table was not a problem at all. I am the only one using that table, unlike the live one.

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top