How to optimize my MERGE statement.

TheBugSlayer · Apr 6, 2016

Hello everyone.

I have a MERGE statement that pulls ~50,000 rows of data from one table in a database to another table table containing >28,500,000 rows in another database, on the same server. However, the SSIS package scheduled to run through a job has been running for 3:00AM and it is almost 1:00 PM! It takes no less than 7 hours. There are other tasks in the package that take a little over an hour to run. This, the last task goes on forever. Both the source and target tables are indexed. Yesterday I changed the recover model to SIMPLE and today I set SQL Server to use up to 28Gb of the available 32Gb memory space.

What other optimizations can I implement to make the MERGE statement run faster?[/]. The query loads an ETL database that will be used to populate a data warehouse.

Below is a section of the MERGE statement

Code:

[COLOR=red] MERGE INTO STAGE1.StageAccessionDetail AS target USING (SELECT * FROM SourceDatabase.dbo.AccessionDetail WITH (NOLOCK)) AS source ON target.AccessionID = source.AccessionID AND target.SpecimenID = source.SpecimenID AND CASE WHEN source.BlockTableID IS NULL THEN 1 ELSE source.BlockTableID END = CASE WHEN target.BlockTableID IS NULL THEN 1 ELSE target.BlockTableID END AND CASE WHEN source.BlockProcedureID IS NULL THEN 1 ELSE source.BlockProcedureID END = CASE WHEN target.BlockProcedureID IS NULL THEN 1 ELSE target.BlockProcedureID END AND target.ChargeID = source.ChargeID AND CASE WHEN source.CPTCode IS NULL THEN '1' ELSE source.CPTCode END = CASE WHEN target.CPTCode IS NULL THEN '1' ELSE target.CPTCode END AND CASE WHEN source.ICDCode IS NULL THEN '1' ELSE source.ICDCode END = CASE WHEN target.ICDCode IS NULL THEN '1' ELSE target.ICDCode END [/color] WHEN MATCHED AND (target.ClientID <> source.ClientID OR target.ClientGroupID <> source.ClientGroupID OR ... ) THEN UPDATE SET AccessionID = source.AccessionID ,ClientID = source.ClientID ,ClientGroupID = source.ClientGroupID ,PathologistID = source.Pathologist WHEN NOT MATCHED THEN INSERT (AccessionID ,ClientID ,ClientGroupID ... IsLatestCase, CaseNumberVersion) VALUES (source.AccessionID ,source.ClientID ,source.ClientGroupID ,source.PathologistID ... )

Your help is appreciated!

MCP SQL Server 2000, MCTS SQL Server 2005, MCTS SQL Server 2008 (DBD, DBA), Training HOTT cert SQL Server 2012 Business Intelligence (SSIS, SSAS).

fredericofonseca · Apr 6, 2016

first a few questions
- operating system and sql server version
- what are the primary keys and indexes on both tables. need ALL of them.
- are indexes on the destination table being disabled prior to the load
- can you supply a explain plan of this merge.
- is volume of rows to insert always around the same number or can it be significantly higher e.g. 500K rows or 1000K rows.
- is the destination table partitioned and if so what is the partition key
- when this job is running how is the cpu load on the machine (and for that matter how many cpu's do you have allocated to the SQL Server instance?)
- what is the maxdop setting on the instance
- are other jobs running at the same time against this instance
- are index stats run frequently

As a initial note the case statements will not help the query for sure.
And in advance of the possible suggestions have you tried changing this process to be a set of update/insert statements?

Regards

Frederico Fonseca
SysSoft Integrated Ltd

http://www.syssoft-int.com

FAQ219-2884
FAQ181-2886

johnherman · Apr 7, 2016

It seems to me that 50,000 inserts or updates should not take 7 hours. Are there any other indications of performance problems on the database server? How often is the table with 28M rows optimized for performance? Do you rebuild it at least occasionally in order to recover lost space and optimize the indexing?

Once concern that I have is regarding 50,000 rollbacks. Are you using commit often enough? You're not trying to do all 50k records and then have a single commit, are you? Frederico's question about Primary Keys and indexes is also important.

Then there's Frederico's suggestion regarding individual inserts and updates. Are there usually more updates or inserts?

For instance, you could try UPDATE record, then if error code is "record doesn't exist" (only this error code!), INSERT the row.
Or if adds are more likely, try to INSERT the record, then if error code (and only with this error code) is "record already exists", then UPDATE the row.

In either case, I would COMMIT every 1000 records or even sooner.

==================================
adaptive uber info galaxies (bigger, better, faster, and more adept than cognitive innovative agile big data clouds)

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to optimize my MERGE statement.

TheBugSlayer

Programmer

fredericofonseca

IS-IT--Management

johnherman

MIS

Similar threads

Part and Inventory Search

Sponsor