Set Based Thinking 1

wbodger · Feb 13, 2017

I have a situation that I have inherited that works, but is slow. Currently, we have a stored procedure that starts off filling a temp table with data. From there it takes a count and then starts a where loop...

Inside the where loop a parent record is inserted into a table with a stored procedure, the id is returned with scope_identity() and then that id is used to insert the child record, those records are deleted from the temp table, parent table and child table and the next record in the temp table is grabbed. Needless to say, this takes far too long to run and seems to be very memory intensive on the initiating machine and on top of that it is in SQL Azure so I cannot create it as a job.

I am trying to find a way to change this over to a set-based function to increase efficiency and decrease run-time, but keep running into roadblocks in my thinking. I was thinking that perhaps I could change the parent insert over to a function and then call that function as a parameter to the child insert stored procedure, but that still has me looping thru a recordset one record at a time. Would I need to ditch the temp table and just write the child table insert and call the function to insert the parent record as an insert value?

Right now it looks like this:

Code:

INSERT INTO #CoreProcess

	SELECT
		core.CoreRawId
		,child.ChildTransactionRawId
		,core.TxCode
		,child.EntityType_ImportFileTypeId
		,child.BundleId
	FROM dbo.CoreRaw core JOIN dbo.[ChildTransactionRaw] child
		ON (core.Credit = child.Amount OR core.Debit = child.Amount) AND core.BankISN = child.BankISN JOIN dbo.TransactionCode code
		ON core.TXCode = code.TxCode and child.EntityType_ImportFileTypeId = code.EntityType_ImportFileTypeID and code.HasChildData = 1
	WHERE child.Amount > 0
	AND core.BankISN = @BankISN
	AND ABS(DATEDIFF(DD,CONVERT(DATE, core.PacketDate),CONVERT(DATE, child.TransactionDate)))<5

SET @RowCount = (SELECT COUNT(CoreRawId) FROM #CoreProcess)

WHILE(@RowCount > 0)
BEGIN
	
	SELECT TOP 1 @CurrCoreId = CoreRawId, @CurrChildId = ChildTransactionRawId, @BundleId = BundleId FROM #CoreProcess
		
	EXEC @transactId = dbo.CopyCoreToHistoryById @CurrCoreId -- Returns ID of parent insert

	IF (@BundleId = '' or @BundleId is null)
		BEGIN
			EXEC dbo.CopyChildRawToHistoryById @transactId, @CurrChildId
			DELETE FROM #CoreProcess WHERE CoreRawId = @CurrCoreId
			DELETE FROM dbo.CoreRaw WHERE CoreRawId = @CurrCoreId
			DELETE FROM dbo.ChildTransactionRaw WHERE ChildTransactionRawId = @CurrChildId
		END
    --if bundle is not null, then need to add those child items as well..
    ELSE IF (@BundleId <> '' or @BundleId is not null)
        BEGIN
			EXEC dbo.CopyChildRawToHistoryByBundleId @transactId, @BundleId
			DELETE FROM #CoreProcess WHERE BundleId = @BundleId 
			DELETE FROM dbo.CoreRaw WHERE CoreRawId = @CurrCoreId
			DELETE FROM dbo.ChildTransactionRaw WHERE BundleId = @BundleId
        END

	SET @RowCount = (SELECT COUNT(CoreRawId) FROM #CoreProcess)

END

I wonder if perhaps I should do something more like:

Code:

INSERT INTO tablename(TransactionId,otherfields)
SELECT (newfunctiontoinsertparent parentid),otherfields
from (full join information)

It seems to me this *should* achieve my goal of not having to step thru RBAR, but instead do it as a set based insert, but not sure if my thinking is entirely correct as far as using a function call to both insert the parent record and return the id of the inserted record within the calling query AND whether this would likely be faster than the existing WHERE loop.

Thoughts or suggestions?

Thanks,
Willie

johnherman · Feb 13, 2017

First, from a processing point of view, if you are processing the parent and child records in some "restartable" manner, or if you are willing to start from the beginning in case of problems, you can separate the DELETE part from the processing. By putting all your new data in a new table, you can avoid doing the Delete and instead just drop or rename the work tables when the new "master" table is build. Deletes are relatively expensive from a processing perspective, each Delete costs about the same amount of compute time as an Insert. Simply delaying the deletes until after the main processing will save beaucoup time.

Second, there is no substitute for quality testing. If you run the original process and then run your process and they match, you have succeeded. Caution: I would use something like 35 days of parallel testing in case of any weird occurrences on a monthly basis. But you know your systems better than me, so you may be satisfied with less, or even want more, days of testing. You can make copies of the tables to another database or server to prevent impact to day-to-day operations by your testing and validation.

==================================
advanced cognitive capabilities and other marketing buzzwords explained with sarcastic simplicity

wbodger · Feb 13, 2017

I suppose that I could wait to do any deletes until after the Loop was done, very good point, thank you. However, on to the main point of my question and that was if it seemed plausible to combine the three stored procedures into one stored procedure and a function. If I create a temp table to hold the records to be moved, then write the child insert statement (which needs the ID of the parent insert) and then have the parent insert (which returns its ID) be a function that is called in the child insert query itself. However, upon further thinking and reading I am reminded that in SQL a function cannot perform an insert and we cannot call an SP from a UDF. So, back to the drawing board, but for now I can speed it up a bit by modifying where I do my deletes.

Thanks!
wb

gmmastros · Feb 14, 2017

In an ideal world, you should be able to remove the loop completely (assuming I understand this correctly). It seems like you want to delete data from a set of tables based on a condition. Before deleting the data, you want to copy data to history.

If the 3 "copy to history" procedures are not too complex, you should be able to make this work.

I do something similar in my database. In my database, I would write this code as a delete trigger. Actually, I would have 2 delete triggers, one of the parent table and one for the child table.

Triggers can be tricky because it's easy to think about triggers affected just one row, but they don't, so you need to be very careful. However, done properly, you should be able to greatly simplify this code.

If you can't incorporate the "copy to history" procedures in to set based code, you can still speed up your loop. When I need to do loops, I usually create a temp table with an identity column, and then use that to handle the loops. Ex:

Code:

Create 
Table   #CoreProcess (
           RowId Int Identity(1,1) Primary Key,
           CoreRawId Int,
           ChildTransactionRawId Int,
           TxCode (datatype),
           EntityType_ImportFileTypeId int,
           BundleId Int
        )

INSERT 
INTO    #CoreProcess(CoreRawId, ChildTransactionRawId, TxCode, EntityType_ImportFileTypeId, BundleId)

	SELECT
		core.CoreRawId
		,child.ChildTransactionRawId
		,core.TxCode
		,child.EntityType_ImportFileTypeId
		,child.BundleId
	FROM dbo.CoreRaw core JOIN dbo.[ChildTransactionRaw] child
		ON (core.Credit = child.Amount OR core.Debit = child.Amount) AND core.BankISN = child.BankISN JOIN dbo.TransactionCode code
		ON core.TXCode = code.TxCode and child.EntityType_ImportFileTypeId = code.EntityType_ImportFileTypeID and code.HasChildData = 1
	WHERE child.Amount > 0
	AND core.BankISN = @BankISN
	AND ABS(DATEDIFF(DD,CONVERT(DATE, core.PacketDate),CONVERT(DATE, child.TransactionDate)))<5

Declare @i Int,
        @max int;

Select  @i = 1,
        @max = Max(RowId)
From    #CoreProcess

While @i <= @max
  Begin

    SELECT @CurrCoreId = CoreRawId, 
           @CurrChildId = ChildTransactionRawId, 
           @BundleId = BundleId 
    FROM   #CoreProcess
    WHERE  RowId = @i;

    EXEC @transactId = dbo.CopyCoreToHistoryById @CurrCoreId -- Returns ID of parent insert

    IF (@BundleId = '' or @BundleId is null)
      BEGIN
        EXEC dbo.CopyChildRawToHistoryById @transactId, @CurrChildId;
      END
    ELSE
      BEGIN
        EXEC dbo.CopyChildRawToHistoryByBundleId @transactId, @BundleId;
      END

    SET @i = @i + 1;   
  End

Delete ChildTransactionRaw
From   ChildTransactionRaw
       Inner Join #CoreProcess
         On ChildTransactionRaw.ChildTransactionRawId = #CoreProcess.ChildTransactionRawId
Where  (#CoreProcess.BundleId = '' or #CoreProcess.BundleId Is NULL)

Delete ChildTransactionRaw
From   ChildTransactionRaw
       Inner Join #CoreProcess
         On ChildTransactionRaw.BundleId = #CoreProcess.BundleId

Delete CoreRaw
From   CoreRaw
       Inner Join #CoreProcess
         On CoreRaw.CoreRawId = #CoreProcess.CoreRawId

Basically, it's this... If you MUST go RBAR, then at least do it as closely to set based as possible.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

Olaf Doschke · Feb 14, 2017

If this is all about history of data, take a look at
1. Auditing (not only about data))
2. Triggers: To do "manual" audit trail
3. CDC - Change Data Capture:

https://blog.sqlauthority.com/2009/...o-change-data-capture-cdc-in-sql-server-2008/

4. Temporal Tables:

https://www.mssqltips.com/sqlservertip/3680/introduction-to-sql-server-2016-temporal-tables/

and

https://msdn.microsoft.com/en-us/library/mt631669.aspx

Not saying that this is in preferred order..., either the other way around, but depends on your server version and license.

Bye, Olaf.

wbodger · Feb 14, 2017

So, perhaps (or rather apparently) I did not really describe the process. We have parent and child (detail) tables with no really good match, so this is one of several sprocs trying to match the data going down in order of specificity. So, when we find a match, we do some transformations and further matching in the insert sprocs so that we have consolidated data in the history tables (which I am now thinking should really be MatchedCore and MatchedDetail). If it were merely copying data to a history table, then I would use triggers to audit, but that won't work in this situation and it is SQL Azure, so CDC is not an option (sorry if I missed that piece of info, thought I had mentioned it in the opening post).

So, having said that, @gmmastros is your suggestion basically what @johnherman had suggested just written out and with a few small tweaks to how I am decrementing (or actually incrementing) the dataset from which I am basing my inserts?

Thanks,
Willie

fredericofonseca · Feb 14, 2017

You could also look into the procs being called and see if you could devise a version of them that would use the temp table #CoreProcess, process all required records and output onto another table (also created previously) with the required transactid

kind of

create #CoreProcess as is
create #transactid_pairs (corerawid, childtransactionrawid, bundleid, transactid)

EXEC dbo.CopyCoreToHistoryByTempTableEntries -- populates #transactid_pairs with ID of parent insert and the other fields
EXEC dbo.CopyChildRawToHistoryByIdTempTableEntries -- existing process except that instead of a single value it joins to table #transactid_pairs
EXEC dbo.CopyChildRawToHistoryByBundleIdTempTableEntries -- existing process except that instead of a single value it joins to table #transactid_pairs

and then the deletes as per George post

Obviously without knowing what the called procs are doing its impossible to say if it will be an easy or near impossible task to implement it this way.

Regards

Frederico Fonseca
SysSoft Integrated Ltd

http://www.syssoft-int.com

FAQ219-2884
FAQ181-2886

gmmastros · Feb 15, 2017

@wbodger,

The code that I posted is (I think) basically what @johnherman was talking about.

Looking at it this way...

Originally,

Code:

Loop Start
  Select from temp table
  execute stored procedure

  depending on data
    execute procedure
    delete from 3 tables
  else
    execute procedure
    delete from 3 tables
End Loop

With my suggestion:

Code:

Loop start
  Select from temp table
  execute stored procedure
  depending on data
    Execute procedure
  Else
    Execute procedure
End Loop
3 deletes

Done this way, the entire process should be a lot faster. If you change it to this, and it's not much faster, then you should take a long hard look at the 3 procedures to see if you can make them faster.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

wbodger · Feb 15, 2017

@gmmastros OK, that is what I thought, I just wanted to make sure I understood what you were both saying. I have implemented the plan you and johnherman suggested in our dev environment and will test later this week or early next week.

Thanks!
Willie

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Set Based Thinking 1

wbodger

Programmer

johnherman

MIS

wbodger

Programmer

gmmastros

Programmer

Olaf Doschke

Programmer

wbodger

Programmer

fredericofonseca

IS-IT--Management

gmmastros

Programmer

wbodger

Programmer

Similar threads

Part and Inventory Search

Sponsor