Cursor question--a faster way?

jsteph · Sep 18, 2012

Hi all,
Yes I know 'cursors are bad'...blah blah. Yet I have a set of data that's in a large table and the processing of this must take place in order--logic on what to do for example, record 100 absolutely depends on what record 99 was and what record 100 and 101 are. (This is just an example, it's slightly more complex than that).

So, I'm sure there's probably some guru who could come up with a single SQL statement sans-cursor that is chock full of CASE and IF/AND/OR statements that may do the job but is completely unreadable by normal humans. (I applaud you, take your gold star, I am but a mere mortal). The process I have requires top-down procedural processing and my question is this:

The set of 'source' data is about 40 mil. rows. The individual sets data that are dependent upon each other themselves can be broken down into smaller sets of a 1000 records or so (based on the field values, keys, etc). So, is it more efficient to use a cursor, then do the logic row-by (yes...agonizing) row), or is the memory overhead, etc of a cursor so much that a 'hand-made' cursor would suffice--ie, Load the sets of data into a Table-Variable (or temp-table), with an identity key, then use a WHILE loop, looping on the Identity (or whatever ordered sequential row-id I choose)? I don't want to do the processing direct on the large table because of the various possibly multiple-reads, checking the previous row, etc--It just seems this would be more efficient to break it down into smaller chunks and that ripping back and forth through copied data in table variable would be better than reading that giant table back-and-forth.
Thanks very much,
--Jim

gmmastros · Sep 18, 2012

For the most part, there is very little difference in performance for a cursor vs. a while loop. There is some black voo-doo magic about cursors that I am not entirely familiar with because I haven't used a cursor in many years. I do encourage you to read up on cursors though. There are things like "forward-only" and "read-only" cursors that are supposed to speed them up a little. This may help a bit, but I would only be guessing.

If you choose to use a temp table or table variable, please be aware that space is reserved in tempdb for them. This means that your tempdb may grow in size a little, but if you are only loading 1000 or rows in to a temp table, the extra space will be minimal.

I understand your concerns regarding readability. Please remember that every "guru" started off a mere "mortal". That being said, you may be better off with some "hybrid" approach. You said there are 40 million rows or approximately 1,000 rows per set. Perhaps you could cursor on the "sets of rows" instead of all the rows. This means there would be approximately 40,000 loops instead of 40 million. This will make a big difference in performance.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

gmmastros · Sep 18, 2012

I was just doing a little reading about cursors. Apparently there is a flag you can use when you declare the cursor called "Insensitive". This flag will...

INSENSITIVE
Defines a cursor that makes a temporary copy of the data to be used by the cursor. All requests to the cursor are answered from this temporary table in tempdb; therefore, modifications made to base tables are not reflected in the data returned by fetches made to this cursor, and this cursor does not allow modifications. When SQL-92 syntax is used, if INSENSITIVE is omitted, committed deletes and updates made to the underlying tables (by any user) are reflected in subsequent fetches.

It sounds like the INSENSITIVE flag essentially does the same thing that you were getting at with the table variable/temp table.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

jsteph · Sep 19, 2012

Thanks everyone...I wasn't aware of the INSENSITIVE flag, though the table I'm reading from is not in use by any process but mine at the time I'm running this, so I'm not sure if there's still overhead in a cursor without that flag. I ended up going with a hybrid approach, loading the logical sets of data into a table variable, since essentially what is needed among other things are self-joins on these sets of data.

For academic purposes, the core reason I wanted to use a cursor is because I'm basically taking an 'audit' table, that instead of being a verticalized fieldname/oldvalue/newvalue pair of audits--this was designed where a single audit record has (among a few other record-id and seq fields) 3 nText fields:

First one has with the entire set of column-names flattened out into a single ntext field using special 4-char delimiters;
Second field has the 'old-value' of the entire audited row flattened out to the second ntext field with the special deilimiters;
Third field has the 'new-value' of the row...again, all flattened out with special delimiters into a single ntext field.

So my task is to break each record into a set of vertical fieldname/oldvalue/newvalue records, which includes using a homemade Split() function (inline so as not to incur a function call each time) to parse the nText fields, while at the same time incorporating logic to skip audits where only certain fields were changed (for example, the application made discreet updates for every field from the IIS app, so one 'logical' update consisted of, say 12 single-field updates *including* the explicit update of the 'datelastmodified' field, and a few other non-critical fields. So if I know 'order-number' was updated at 'x' time, I want that update---but I don't need a separate record to tell me that, by the way, we updated LastModified to value 'x', since I already have the audit-time with the record that said 'order-number' changed. So this logic to parse this out, while comparing an single, maybe 4000+ char line of Oldvalue to the 4000+-char line of newvalue, deciding which values, at which charindex, (and there were dozens of different tables audited so the delimiters and number of 'fields' in each ntext field varied row-by-row) and then parsing the 'good' audits into a vertical table--this simply proved too much a challenge to do in a single SQL statement.

As I'd said, I'm sure it's 'possible', I just am not sure what the code would look like and how easy it might be to adjust, debug, etc.
But the job is done, and it ran reasonably fast (<12 hours) so I am good to go. Thanks very much,
--Jim

fredericofonseca · Sep 19, 2012

hum.... maybe a table function to split the field updates, resulting on a table like 'table_name', 'field_name', 'value_before', 'value_after' and a bit more info to allow you to process them on the required order... said function could eventually also remove the un-required fields..

Something following the above may (or not) help you remove some of the cursory type of processing

Regards

Frederico Fonseca
SysSoft Integrated Ltd

http://www.syssoft-int.com

FAQ219-2884
FAQ181-2886

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Cursor question--a faster way?

jsteph

Technical User

gmmastros

Programmer

gmmastros

Programmer

jsteph

Technical User

fredericofonseca

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor