Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Having a logic issue with IF @@ROWCOUNT 1

Status
Not open for further replies.

POSAPAY

IS-IT--Management
Jul 27, 2001
192
0
0
HU
Hey Everyone,

Here is my code:
Code:
...
SET @CountDeleted = 0
SET ROWCOUNT 500
delete_more1:
UPDATE tblQuarantine SET Expire = 1 WHERE ((DATEDIFF(day, MsgDate, GETDATE()) > 15) AND (Expire <> 1))
SET @CountDeleted = @CountDeleted + 500
IF @@ROWCOUNT < 1 GOTO delete_more1_skip
IF ((@@ROWCOUNT > 0) AND (@CountDeleted < 10001)) GOTO delete_more1
delete_more1_skip:
...

Basically there are about a 100,000 records that should expire, and not marked expired yet.

This script should loop until all records with date stamp more than 15 days old are marked as Expired. or until it hits 10000 records as marked.

The issue is that from what I can tell, it only goes though this once, and never loops back to do the next 500.

If I take the skip IF statement out, than it keeps looping to do 10000 records, even after it has finished the expired ones, and produces zero rowcounts until it has looped all the way through.
 
Your problem is here:

Code:
SET @CountDeleted = @CountDeleted + 500
IF @@ROWCOUNT < 1 GOTO delete_more1_skip

Setting the value of a variable will affect the row count. In fact, the @@RowCount will ALWAYS be one here.

To fix this problem, declare another variable and select the @@RowCount in to this variable.



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I would do this in a WHILE loop:
Code:
SET @CountDeleted = 0
SET ROWCOUNT 500
WHILE EXISTS(SELECT *
                    FROM tblQuarantine 
             WHERE ((DATEDIFF(day, MsgDate, GETDATE()) > 15) AND
                    (Expire <> 1))) AND
      @CountDeleted < 10001
      BEGIN
       UPDATE tblQuarantine 
              SET Expire = 1
       WHERE ((DATEDIFF(day, MsgDate, GETDATE()) > 15) AND
              (Expire <> 1))
       SET @CountDeleted = @CountDeleted + @@ROWCOUNT
      END

NOT TESTED!!!!

Borislav Borissov
VFP9 SP2, SQL Server 2000/2005.
 
Great, thanks! I was not aware of @@ROWCOUNT being changed when read only.
I've used an additional variable to grab the value now and use it in multiple places instead of the @@ROWCOUNT directly.

Thanks for your help, that worked perfectly!
 
You know.... I'm glad I was able to help with the @@RowCount thing. It's important to know.

But...

I was thinking about this a little more, and I have another suggestion I would like to make. I suspect the reason you are updating the table in batches is to improve the performance. This is a worthy goal, and I applaud you for it.

When optimizing for performance, there are many considerations to take in to account. In this case, you should be able to greatly speed up the performance of this update by creating a [google]sargable[/google] where clause. To do this, you need to write the query in such a way that it can effectively use an index.

I'm gonna guess that the Expire column has the bit data type. Bits are usually not a good candidate for an index because of the low selectivity. The msgdate column, however, will probably be a great candidate for an index. In fact, I wouldn't be surprised if there was already an index on this column. If there isn't, there really ought to be.

Run this:

Code:
sp_helpindex 'tblQuarantine'

There will likely be multiple rows returned by the above statement. Each row will have 3 columns. Look at the index_keys column. Some indexes may have just one column, and some may have multiple columns (separated by a comma). If there is NOT an index that has the msgdate column as the FIRST column in the index_keys, then you should create an index for this column. Like this...

Code:
Create Index idx_tblQuarantine_MsgDate On tblQuarantine(MsgDate)

If there is already an index with the msgdate column as the FIRST column in the index, then do NOT create a new index. Duplicate indexes will slow down your performance.

Anyway... now that there is an index on the msgdate column, it will still NOT be used by the query you wrote. But.... you can re-write is slightly so that it accomplishes the same thing, but effectively uses the index.

Like this:

Code:
[!]Declare @ExpireDate DateTime
Set @ExpireDate = DateAdd(Day, -15, GETDATE())[/!]

UPDATE tblQuarantine 
SET    Expire = 1 
WHERE  [!]MsgDate < @ExpireDate[/!]
       AND Expire <> 1

If your MsgDate column is a SmallDateTime (and not a DateTime), then you should declare @ExpireDate to be a SmallDateTime instead.

It's possible that modifying the code like this will be fast enough for you so that you don't need to update in batches. Effectively using an index on a huge table can give you huge performance gains.

Make sense?

I strongly encourage you to try this. But, can you please let me know the performance difference. I'd like to know how long the overall delete process takes now, and then later when you implement the changes I suggest. I'm just curious.



-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hi George,
Thanks for your suggestion!

The purpose of this SP, which is scheduled using a job to run every 3 hours currently, is to delete records older than 15 days. (Spam e-mails)
Roughly daily it gains about a 100,000 spam e-mails in this table.
Currently got around 1.5 million records.

tblQuarantine has the main fields of the e-mail, and a msgID field links to another table that has the messages.

The application adding records is smart enough to recognize if a msg body matches another e-mail's msg body, so one msgID may be used for multiple quarantine records.


So my SP actually runs three loops:
1) Mark the Expire bit as 1 for all that are over 15 days old.
2) Delete all records that have the Expire set as 1
3) look for orphan msgid records in the msg table.

All three go through 500 records at a time because the process can take a while at midnight when all of a sudden a 100,000 records need to be deleted, and we want to make sure the tables are available for the application to add records in the mean time.. between two 500 batches.
I also limited the process to 30,000, because I don't want the load to run so long to possibly cause issues.

I have it grab datetime before and after the SP and email it to me.
So currently, when no records are found to be deleted at all, it takes 37 seconds to run. Too long!

With 60 records deleted, it takes 61 seconds.

At midnight, 30,000 quarantine and 30,000 msgID deletion took 27 minutes!!



So for the sp_helpindex on tblQuarantine I get:
Code:
EmailFrom	nonclustered located on PRIMARY	EmailFrom
EmailTo		nonclustered located on PRIMARY	EmailTo
MsgDate		nonclustered located on PRIMARY	MsgDate
MsgID		nonclustered located on PRIMARY	MsgID
QuarID		clustered, unique, primary key located on PRIMARY	QuarID
RejectID	nonclustered located on PRIMARY	RejectID
ServerID	nonclustered located on PRIMARY	ServerID
Subject	nonclustered located on PRIMARY	Subject

So for marking expired, I have:
Code:
SET ROWCOUNT 500
SET @RowCountHolder = 0
delete_more1:
-- PRINT 'delete1 processing marking expired ones' + '[marked:'+CAST(@CountDeleted AS VARCHAR(10)) + ']'
SET @msg = @msg +  'D1:'+CAST(@CountDeleted AS VARCHAR(10))
UPDATE tblQuarantine SET Expire = 1 WHERE ((DATEDIFF(day, MsgDate, GETDATE()) > 15) AND (Expire <> 1))
SET @RowCountHolder = @@ROWCOUNT
SET @msg = @msg +  '-Prcsd:' + CAST(@RowCountHolder AS VARCHAR(10)) + '<br>'
SET @CountDeleted = @CountDeleted + 500
IF @RowCountHolder < 1 GOTO delete_more1_skip
IF ((@RowCountHolder > 0) AND (@CountDeleted < 30001)) GOTO delete_more1
delete_more1_skip:
SET ROWCOUNT 0
SET @CountDeleted = 0
SET @RowCountHolder = 0
SET @msg = @msg + 'T1:' + CAST((SELECT CONVERT(VARCHAR,GETDATE(),8)) AS VARCHAR(100)) + '<BR>'

Then to delete actual records:
Code:
-- PRINT 'delete_more2 is next'
SET ROWCOUNT 500

delete_more2:
-- PRINT 'delete2 processing actually deleting from quarantine' + '[marked:'+CAST(@CountDeleted AS VARCHAR(10)) + ']'
SET @msg = @msg +  'D2:'+CAST(@CountDeleted AS VARCHAR(10))
DELETE FROM tblQuarantine WHERE tblQuarantine.Expire <> 0
SET @RowCountHolder = @@ROWCOUNT
SET @msg = @msg +  '-Prcsd:' + CAST(@RowCountHolder AS VARCHAR(10)) + '<br>'
SET @CountDeleted = @CountDeleted + 500
IF @RowCountHolder < 1 GOTO delete_more2_skip
IF ((@RowCountHolder > 0) AND (@CountDeleted < 30001)) GOTO delete_more2
delete_more2_skip:
SET ROWCOUNT 0
SET @CountDeleted = 0
SET @RowCountHolder = 0
SET @msg = @msg + 'T2:' + CAST((SELECT CONVERT(VARCHAR,GETDATE(),8)) AS VARCHAR(100)) + '<BR>'

Lastly to delete the Orphan msgs:
Code:
--===CLEANUP Orphaned Msgs in tblMsgs=====
-- PRINT 'delete_more3 is next - CleanUp Time'
SET ROWCOUNT 500
delete_more3:
-- PRINT 'delete3 processing deleting msgs' + '[marked:'+CAST(@CountDeleted AS VARCHAR(10)) + ']'
SET @msg = @msg +  'D3:'+CAST(@CountDeleted AS VARCHAR(10))
DELETE tblMsgs FROM tblMsgs LEFT JOIN tblQuarantine 
ON tblMsgs.MsgID = tblQuarantine.MsgID WHERE (tblQuarantine.MsgID IS NULL)
SET @RowCountHolder = @@ROWCOUNT
SET @msg = @msg +  '-Prcsd:' + CAST(@RowCountHolder AS VARCHAR(10)) + '<br>'
SET @CountDeleted = @CountDeleted + 500
IF @RowCountHolder < 1 GOTO delete_more3_skip
IF ((@RowCountHolder > 0) AND (@CountDeleted < 20001)) GOTO delete_more3
delete_more3_skip:
SET ROWCOUNT 0
SET @CountDeleted = 0
SET @RowCountHolder = 0

Perhaps with all three in view you may have an even better more efficient suggestion.

For the 30,000 record delete:
Marking Expired took: 19 seconds
Deleting Expired Quarantine records: 6 min 20 sec
Deleting Orphans took: 29 minutes!!

So the very last part is my biggest concern.

I'll mess with the first part that you suggested, and see if I can get significant improvement.
 
Bummer. Looks like my suggestion will only affect the "Marking Expired" part. You may be able to get that part down to 1 second (from 19 seconds). This will still leave you with approximately 30 minutes for the whole process.

I would encourage you to put transactions in to this process. I've seen transactions speed up processes like this because you can better control how things are logged.

It looks like you have 3 string columns in this table. I would encourage you to re-design the table. For example, you could have an EmailAddress table with an EmailAddressID and EMAIL. Then you could change this table to store the EmailAddressId instead of the email address. Of course, this ID should be an integer. This will reduce the size of the table, which will cause it to use less space on disk, and ultimately make everything about this table run faster. I realize this is no easy task because there may be many queries using this table, but I really do think this will make a huge difference.

Also... you have a lot of indexes on this table. If you dropped the indexes before deleting, and then recreated them after the delete... the delete process will probably be a lot faster.

Lastly... are you rebuilding your indexes nightly? After this massive delete occurs? If you aren't, you really ought to. Massive deletes can cause lots of fragmentation with your indexes. Fragmented indexes = horrible performance.




-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George,
The application is a windows service application that uses this database, and this was their recommended table setup.
It is adding records, and checking every 5 seconds for records marked for delivery as well.

I'm a bit nervous on changing the designs since it is a compiled app, and I'm not sure what it really does in the background.

The last part, deletion of orphan msgid records is what I should see if we could make it more efficient, since it takes the longest time.
I'm not sure what you mean by "put transactions in to this process"
Could you please expand on it, and perhaps give an example?
Thanks,
-Peter
 
You really should read this: faq183-3141

That FAQ shows examples using transactions and also how to use locks to speed up your queries.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George,
Thanks! I'm kinda swamped right now, but I'll try to create a second SP to test the recommendations.
Once I did it, I'll report it back here the improvements.
-Peter
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top