Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Transaction log issues 1

Status
Not open for further replies.

ehenry

Programmer
Sep 18, 2009
65
US
I have a table that I update in my database once a month. The data comes from a 7+ GB flat file I usually end up with well over 100 million rows in a table with 5 columns. My problem is that during this insert the transaction log blows up to over 100GB. After the insert I truncate the log. I have no real need to use any transaction log on this database...it is 2 tables one with historical one with meta data, no updates or inserts are done with the data and it is completely refreshed once a month. The file also grows every month, so I am running out of disk capacity for the log file during the insert. Is there any way to deal with this outside of setting recovery mode to simple? Is there a hint to not use transaction logging during this bulk insert statement? Thanks for any help.
 
This topic has come up before, it seems the common advice is that can do your inserts in smaller batches. Clearing the log between batches.

Also possibly, hardware wise you if your not using a seperate hard drive for your logs you could add a larger drive for logs. A seperate drive is usually recommended anyway.

Simi

 
Thanks for the response...I figured as much since SQL Server must use the log to operate. The log file is on its own drive...I am going to split the file then insert smaller batches and truncate the log after each batch like you said. Thank you.
 
ehenry,

Sorry to ask the question but as you said you have no real need for the transaction log - yet you dont seem to want to put the database in simple mode. Any reason?

Its just i would say logical that if you dont need the transaction log that you would then pop it on simple.

Dan

----------------------------------------

Be who you are and say what you feel, because those who mind don't matter and those who matter don't mind - Dr. Seuss

Computer Science is no more about computers than astronomy is about telescopes - EW Dijkstra
----------------------------------------
 
Dan,

Thanks for asking that question...I'm wondering the same thing.

ehenry,

The purpose of setting the database to Full Recovery is to be able to make backups of the transaction logs which then allows you to restore your database to a specific moment. If you aren't backing up your tlogs, then they will grow huge anyways. If you don't need to save the tlogs and aren't backing them up then there isn't any gain by having the database in full recovery mode.

-SQLBill

The following is part of my signature block and is only intended to be informational.
Posting advice: FAQ481-4875
 
The database is in Simple recovery mode...sorry for the confusion I wasn't saying I did not want to put it in simple mode, just wondering if there was a way to prevent writing to the log at all, or minimizing what SQL Server writes to the log.

When I run the bulk insert command the transaction log will still grow during the execution. My understanding of simple mode is that sql server will automatically truncate the log file after operations, it still requires writing to the log file to perform the operations, and since this is only one statement the log cannot truncate during the execution. I have tried truncating the log from a separate query but it does not work...log still grows.

Any thoughts?
I'm thinking the only way is to break the flat file up into smaller files and truncate the log after each insert.
 
You never really state the method you use to load the data in to the database. If you are using BULK INSERT, you should be using TABLOCK and ROWS_PER_BATCH to minimize the size of the transaction log. This will likely improve performance also.

There are some restrictions that you must be aware of in order for SQL Server to use minimal logging during bulk insert, but it shouldn't be too difficult to research the issues and accomplish it.

For example, I assume there are indexes on the table. If you drop the indexes before loading the data and then create the indexes afterwards, you will likely get less logging and better performance.

TABLOCK means "Table Lock". So, if you use TABLOCK, nothing else will be able to use the table while the BULK INSERT is occurring, but the bulk insert will also take less time.

I encourage you to take a look here:


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks for the response...Yes, I am using a BULK INSERT...So if I specify ROWS_PER_BATCH while in simple mode, will the log be truncated after each batch?
 
Here is the procedure...assuming no indexing on the tables.

IF EXISTS ( select id from sysobjects where name like 'raw_pden_prod' and xtype = 'u' )
DROP TABLE raw_pden_prod
GO

CREATE TABLE raw_pden_prod
(
ENTITY_ID varchar(max), PROD_DATE varchar(max), LIQ varchar(max), GAS varchar(max), WTR varchar(max), WCNT varchar(max)

) ON [PRIMARY]

bulk insert raw_pden_prod
from 'F:\HPDI\Data\PDEN_PROD.txt'
WITH
(
FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', TABLOCK, ROWS_PER_BATCH = 15000000
)

TRUNCATE TABLE PDEN_PROD

INSERT INTO PDEN_PROD
SELECT ENTITY_ID, CAST(PROD_DATE as Date), LIQ, GAS, WTR, WCNT FROM raw_pden_prod
WHERE (LIQ <> 0 OR GAS <> 0 OR WTR <> 0 OR WCNT <> 0)


IF EXISTS ( select id from sysobjects where name like 'raw_pden_prod' and xtype = 'u' )
DROP TABLE raw_pden_prod
GO
 
The data comes from a 7+ GB flat file I usually end up with well over 100 million rows in a table with 5 columns.

Actually, rows_per_batch should be the approximate number of rows in the file, so you should use rows_per_batch=100000000

There is another argument called BATCHSIZE that you can experiment with. According to the documentation: "Specifies the number of rows in a batch. Each batch is copied to the server as one transaction." I would suggest that you try something like BATCHSIZE=100000.

It looks like you actually have a couple integers and a date. You probably have approximately 50 bytes per row. So, if you use a batch size of 100,000 that should be approximately 5 megs of data per transaction.

I would suggest that you remove the "insert into real table" part to test this. If you make the changes that I suggest, I would expect the transaction log to stay relatively small.

The part where you are inserting in to the real table is likely to be a big problem where (from a transaction log perspective). The entire table will ultimately get logged in the transaction log because the entire thing is treated as a single transaction. This is where Simian336's advice would come in to play.

If you can import the data from the file in to your "raw" table without a log to logging, then you should work on moving the data from the raw table to the real one in batches.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I will experiment with this. Thanks for all the input.
 
Worked great, thanks for the help. The bulk insert with BATCHSIZE, ROWS_PER_BATCH and TABLOCK did not add anything to the transaction log...Inserting from the raw table to the actual table with no index and using TABLOCK kept the log growth minimal. Much appreciated.
 
That's great.

If you don't mind... I'm curious about the performance. How long (approximately) did the process take before and how long does it take now?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
~223 million rows took 25 min to the raw table, and another 25 to the actual table. Used to take well over an hour for each. Huge performance increase :)
 
That's approximately 10 million rows per minute. for each operation. [bigsmile]. I can't help but wonder if we can squeeze a little more performance out of this.

Since you are importing a 7 GB file and (at least temporarily) you have a duplicate of it, so 14 Gigs, I can only assume that the data file (the .mdf) must be growing during this operation.

Can you tell me what the auto-grow setting is for this database?
In SQL Server Management Studio, right click your database, click properties, click files. What is the autogrowth setting?

Can you tell me if you have instant file initialization turned on?
To do this, download SQLCop (there is a link in my signature), log in to your database, expand configuration, and click on "Instant File Initialization".

Given the size of your import, I would recommend your auto-growth setting be "by 5000 megabytes". Of course, tweaking this setting may give even better performance. I would also strongly encourage you to make sure that instant file initialization is turned on. If it's not, it will take SQL Server considerably more time to expand the size of your database. With instant file initialization turned on, your database will auto-grow (almost) instantly).

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Changed autogrowth from 1 to 5000 unrestricted. The .mdb is 35.5 GB. There are only 2 tables in the database one contains the metadata and they are cleared before the monthly insert, but the size of this file will slowly increase over time. I will work on the Instant File Initialization and experiment with autogrowth. Will let you know more next week. Have a good weekend and thanks again for the help.
 
Here is something people forget to check. What size is your Model database set to for autogrow? When the SQL services are restarted, Tempdb is built based on the sizes in Model. Some transactions make use of the Tempdb and if it is set for a small autogrow, that can cause performance issues.

If the Tempdb does have a small autogrow set, you can do a couple of things...1 - increase the sizes for Model or 2 - create a script that runs on startup to alter the sizes of Tempdb.

-SQLBill

The following is part of my signature block and is only intended to be informational.
Posting advice: FAQ481-4875
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top