Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to optimize my MERGE statement.

Status
Not open for further replies.

TheBugSlayer

Programmer
Sep 22, 2002
887
US
Hello everyone.

I have a MERGE statement that pulls ~50,000 rows of data from one table in a database to another table table containing >28,500,000 rows in another database, on the same server. However, the SSIS package scheduled to run through a job has been running for 3:00AM and it is almost 1:00 PM! It takes no less than 7 hours. There are other tasks in the package that take a little over an hour to run. This, the last task goes on forever. Both the source and target tables are indexed. Yesterday I changed the recover model to SIMPLE and today I set SQL Server to use up to 28Gb of the available 32Gb memory space.

What other optimizations can I implement to make the MERGE statement run faster?[/]. The query loads an ETL database that will be used to populate a data warehouse.

Below is a section of the MERGE statement
Code:
[COLOR=red]
	MERGE INTO STAGE1.StageAccessionDetail AS target
	USING (SELECT * FROM SourceDatabase.dbo.AccessionDetail WITH (NOLOCK)) AS source ON target.AccessionID = source.AccessionID 
		AND target.SpecimenID = source.SpecimenID
		AND CASE WHEN source.BlockTableID IS NULL THEN 1 ELSE source.BlockTableID END = CASE WHEN target.BlockTableID IS NULL THEN 1 ELSE target.BlockTableID END
		AND CASE WHEN source.BlockProcedureID IS NULL THEN 1 ELSE source.BlockProcedureID END = CASE WHEN target.BlockProcedureID IS NULL THEN 1 ELSE target.BlockProcedureID END
		AND target.ChargeID = source.ChargeID
		AND CASE WHEN source.CPTCode IS NULL THEN '1' ELSE source.CPTCode END = CASE WHEN target.CPTCode IS NULL THEN '1' ELSE target.CPTCode END
		AND CASE WHEN source.ICDCode IS NULL THEN '1' ELSE source.ICDCode END = CASE WHEN target.ICDCode IS NULL THEN '1' ELSE target.ICDCode END
[/color]
	WHEN MATCHED 
	AND  
			  (target.ClientID <> source.ClientID OR
			   target.ClientGroupID <> source.ClientGroupID OR
			   ...
			   		   )
	THEN UPDATE SET 
	AccessionID = source.AccessionID
	,ClientID = source.ClientID
	,ClientGroupID = source.ClientGroupID
	,PathologistID = source.Pathologist
	WHEN NOT MATCHED THEN
	INSERT (AccessionID
			   ,ClientID
			   ,ClientGroupID
	...
		   IsLatestCase, CaseNumberVersion)
		 VALUES
			   (source.AccessionID
			   ,source.ClientID
			   ,source.ClientGroupID
			   ,source.PathologistID
			   ...
			   )

Your help is appreciated!

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA), Training HOTT cert SQL Server 2012 Business Intelligence (SSIS, SSAS).
 
first a few questions
- operating system and sql server version
- what are the primary keys and indexes on both tables. need ALL of them.
- are indexes on the destination table being disabled prior to the load
- can you supply a explain plan of this merge.
- is volume of rows to insert always around the same number or can it be significantly higher e.g. 500K rows or 1000K rows.
- is the destination table partitioned and if so what is the partition key
- when this job is running how is the cpu load on the machine (and for that matter how many cpu's do you have allocated to the SQL Server instance?)
- what is the maxdop setting on the instance
- are other jobs running at the same time against this instance
- are index stats run frequently


As a initial note the case statements will not help the query for sure.
And in advance of the possible suggestions have you tried changing this process to be a set of update/insert statements?

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
It seems to me that 50,000 inserts or updates should not take 7 hours. Are there any other indications of performance problems on the database server? How often is the table with 28M rows optimized for performance? Do you rebuild it at least occasionally in order to recover lost space and optimize the indexing?

Once concern that I have is regarding 50,000 rollbacks. Are you using commit often enough? You're not trying to do all 50k records and then have a single commit, are you? Frederico's question about Primary Keys and indexes is also important.

Then there's Frederico's suggestion regarding individual inserts and updates. Are there usually more updates or inserts?

For instance, you could try UPDATE record, then if error code is "record doesn't exist" (only this error code!), INSERT the row.
Or if adds are more likely, try to INSERT the record, then if error code (and only with this error code) is "record already exists", then UPDATE the row.

In either case, I would COMMIT every 1000 records or even sooner.

==================================
adaptive uber info galaxies (bigger, better, faster, and more adept than cognitive innovative agile big data clouds)


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top