Olaf Doschke
Programmer
Since we had a side discussion about Rushmore recently, I thought I redo some tests and I found interesting results:
I tested DO WHILE vs SCAN with different scopes and therefore strategies using automatic Rushmore optimization VFP does vs direct index usage and a combination of both. I set up the task to find a split portion of data scattered in very many records (2 million). And one observation upfront: What takes longest is creating the data. Even the least performant DO WHILE loop only makes use of the order the index sets, is taking only a few seconds, and is a case that's surely not matching the case with 10 records we came from.
I guess there is something to learn and I get some assumptions confirmed, but I am happy to be criticized and corrected, no matter by whom.
I needed to use HighPerformance Counters as the Seconds() value has a precision of about 0.015 seconds, a tick frequency of the BIOS click, that's causing results to be "pixelated", you only see multiples of 1-3 ticks and that isn't making results relevant and comparable. OK, that upfront, here's the code, followed by a little explanation and interpretation of the results:
By the way, thanks to Craig Boyd, I took the API calls from
To explain the cEmpty field: The creation of 2^21 records takes much longer with that many APPEND BLANKs than appending the cursor to itself. But to get new SYS(2015) default values for all records you should only append other fields than cValue, so there needed to be another field in the cursor. Another solution would be first only creating empty rows and then updating them with sys(2015). In the SQL query I only select the cValue field for fairness.
Now here are results for my circa 4-year-old and even then only above average but not top notch PC with an Intel i7 CPU:
[pre] 58254 records
Locate time 0,0396
Do While time 2,6528
-------------
873818 records
Seek time 0
Scan All time 0,4775
-------------
58254 records
Scan For time 0,0328
-------------
58254 records
Seek time 0
Scan While time 0,0356
-------------
58254 records
Seek time 0
Scan Rest For time 0,0986
-------------
58254 records
SQL query time 0,0429[/pre]
First, notice for all fairness I put reading the cValue column as part of the loop body code, so it's not just the record pointer moving and even in the case, Rushmore optimizes the Index expression the indexed field is read. Just like SQL does read this and copy it.
First observation: Both WHILE and SCAN ALL are obviously not optimized and take much longer. It's easy to understand that all the things SCAN does automatically even in case of not making further optimizations alrady make it faster.
Second observation: Surprise, SCAN FOR actually is comparably fast to the combination of SEEK and SCAN WHILE. Someone else predicted the SEEK/WHILE combination to be best and WHILE not being optimized, just making use of the index order, so the main act is the fast SEEK to the first result record. Well, if WHILE was really just profiting from the index order and nothing, then you'd expect about the same time as the SCAN ALL with the Exit test, which rather is the corresponding SCAN UNTIL. So first "surprise": I'd judge this as Rushmore optimization of WHILE, even though the VFP help does not specify such a thing to be done.
Next surprise is, even though I explicitly turned off the index order for the SCAN FOR test it optimizes looping all wanted records almost as good as the SEEK/WHILE combination does. The surprise is not, that Rushmore does that, it's normal that Rushmore determines a usable index, the surprise rather is that even though Rushmore is doing that over and over again with each next row it determines, it's still that fast.
The biggest surprise for some perhaps is, that SQL is not very fast, A (mainly) single field cursor with a single index is not yet a situation that profits from SQLs better Rushmore optimization, the first disadvantage is it does not only read but also writes all data into a new workarea. If you allow a filter cursor that's not the case, you get a reopened DBF with a SET FILTER, that's created faster, but it's not really creating a result, that only is created when using that filter cursor and it has many disadvantages, like having the RECCOUNT() of the underlying DBF instead of the _TALLY. Also, that speed is deceiving, as you'd have to take the time you read from a filtered cursor into consideration. Just as a USE doesn't read any data yet, it's not really executing the query. I prevented this anyway by creating an array, the picking of relevant rows is optimized, but the records determined are then read and written.
You can now add the creation of an array into any loop and make these take longer again, that's surely also a fair thing to do, but in the situation we talked about initially that array now also needs to be copied once more with ACOPY to a combobox/listbox list array.
In all fairness SQL is the winner in most queries, even still simple queries joining just two tables, the more joins and where conditions you have, the more it can profit from using multiple indexes and the more it profits from building up a bitmap of all record numbers to fetch before fetching them, while a scope condition only is optimized for fetching the next matching record, so it is applied and reapplied again and again while SQL does its Rushmore job ahead of creating the result and then has one final result.
Now, what I guessed wrong is that WHILE compares to REST FOR, but that was missing the point, that the results, in general, could differ. A WHILE scope ends with the first non-matching record, a REST FOR just starts at the current row, but still goes through to EOF, so it finds all matching records in the rest of the data. Both results here are the same, as data is sorted when using WHILE and REST FOR. But the SCAN REST FOR has to go through the majority of data following all the 'F' rows, too.
What's more important to me is to see a SCAN WHILE is not taking as long as a DO WHILE or SCAN ALL checking the condition for each record, so I guess that proves Rushmore optimization is also involved in more than the optimization of the FOR conditions/scoping, if you compare the times to unoptimized loops. The help is, as it often is, maybe not wrong, but not complete.
There's much more to learn about the mechanism of Rushmore, it's dependency on the correct supply of foreground memory with SYS(3050), for example, how it differs in all commands with scopes to only optimie getting the next record and in SQL queries, where it optimizes finding a list of all recnos to fetch. How the Grid.Optimize changes the way the grid fetches rows and why LOCATE is better than GO TOP, etc, etc.
Now I am open for discussion or code changes or further tests. I just repeat my conclusion from elsewhere: Before you start making micro-optimizations and perhaps waste more of your time than you save for all your users together. Stop thinking for cases of a handful of records. Just notice once again how I needed to produce a large DBF to even measure the differences with enough precision. You can always introduce errors or problems, so you risk something for a doubtful optimization, I'd always concentrate on problematic cases. When you think of rewriting more modern, you could even step out of the view on your code and VFP and think of a rewrite in another programming language or development stack. When you write in VFP it's okay to know when you can use name expressions instead of macro substitutions, it's okay to apply mdots and already introduce all these micro-optimizations you can always apply without needing to consider any circumstances. But it's counteracting a very valid and important principle of premature optimization and you can apply all these micro-optimizations choosing the wrong overall strategy and algorithm you can easily still have a factor of 2,10 or even more times of execution time in comparison with a cleverer solution or introducing a third party component doing some task with the embedded knowledge of an expert of fields. Besides performance, there also are topics as security, stability and maintainability. They don't always get hand in hand with best performance wise practices even in terms of readability. Therefore - just as one example - a precompiler step automatically introducing mdots to variable reads or automatically changing any assignments to STOREs would be nice, but it's hard to write such a thing for the general case never failing on edge cases.
Bye, Olaf.
Olaf Doschke Software Engineering
I tested DO WHILE vs SCAN with different scopes and therefore strategies using automatic Rushmore optimization VFP does vs direct index usage and a combination of both. I set up the task to find a split portion of data scattered in very many records (2 million). And one observation upfront: What takes longest is creating the data. Even the least performant DO WHILE loop only makes use of the order the index sets, is taking only a few seconds, and is a case that's surely not matching the case with 10 records we came from.
I guess there is something to learn and I get some assumptions confirmed, but I am happy to be criticized and corrected, no matter by whom.
I needed to use HighPerformance Counters as the Seconds() value has a precision of about 0.015 seconds, a tick frequency of the BIOS click, that's causing results to be "pixelated", you only see multiples of 1-3 ticks and that isn't making results relevant and comparable. OK, that upfront, here's the code, followed by a little explanation and interpretation of the results:
Code:
Public lcOutput
lcOutput=""
Set Textmerge On
Set Textmerge To Memvar lcOutput Additive
Create Cursor crsTest (cEmpty c(1), cValue c(10) default sys(2015))
Append Blank
For lnI = 1 to 21 && means 2^21 records
Append From Dbf("crsTest") Fields cEmpty
EndFor
Index On Right(cValue,1) tag xTest
LOCAL lcBuffer, lnTicksPerSecond, t0, t1, t2
DECLARE INTEGER QueryPerformanceFrequency IN kernel32 STRING @lpFrequency
DECLARE INTEGER QueryPerformanceCounter IN kernel32 STRING @lpPerformanceCount
lcBuffer = SPACE(8)
=QueryPerformanceFrequency(@lcBuffer)
lnTicksPerSecond = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Clear
Local lnCount, lcValue
Store 0 To lnCount
Select crsTest
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Locate For Right(cValue,1)="F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Do While Not Eof("crsTest") AND Right(cValue,1)="F"
lcValue = cValue
Store m.lnCount + 1 To lnCount
Skip 1
EndDo
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Locate time <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Do While time <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------
Store 0 To lnCount
Select crsTest
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Seek "F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan All
If Right(cValue,1)="F"
Exit
Endif
lcValue = cValue
Store m.lnCount + 1 To lnCount
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Seek time <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Scan All time <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------
Store 0 To lnCount
Select crsTest
Set Order To && no direct Index usage, let Rushmore find it
Locate
Select crsTest
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan For Right(cValue,1)="F"
lcValue = cValue
Store m.lnCount + 1 To lnCount
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Scan For time <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------
Store 0 To lnCount
Select crsTest
Set Order To XTEST && back to ordered cursor
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Seek "F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan While Right(cValue,1)="F"
lcValue = cValue
Store m.lnCount + 1 To lnCount
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Seek time <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Scan While time <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------
Store 0 To lnCount
Select crsTest
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Seek "F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan Rest For Right(cValue,1)="F"
lcValue = cValue
Store m.lnCount + 1 To lnCount
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Seek time <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Scan Rest For time <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Select cValue from crsTest Where Right(cValue,1)="F" Into Array laResult
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<Alen(laResult,1)>> records
\ SQL query time <<Transform((t1-t0)/lnTicksPerSecond)>>
Set Textmerge To
Set Textmerge Off
_cliptext = lcOutput
FUNCTION buf2num(tcBuffer)
************************
RETURN ASC(SUBSTR(tcBuffer, 1,1)) + ;
ASC(SUBSTR(tcBuffer, 2,1)) * 2^8 + ;
ASC(SUBSTR(tcBuffer, 3,1)) * 2^16 + ;
ASC(SUBSTR(tcBuffer, 4,1)) * 2^24
To explain the cEmpty field: The creation of 2^21 records takes much longer with that many APPEND BLANKs than appending the cursor to itself. But to get new SYS(2015) default values for all records you should only append other fields than cValue, so there needed to be another field in the cursor. Another solution would be first only creating empty rows and then updating them with sys(2015). In the SQL query I only select the cValue field for fairness.
Now here are results for my circa 4-year-old and even then only above average but not top notch PC with an Intel i7 CPU:
[pre] 58254 records
Locate time 0,0396
Do While time 2,6528
-------------
873818 records
Seek time 0
Scan All time 0,4775
-------------
58254 records
Scan For time 0,0328
-------------
58254 records
Seek time 0
Scan While time 0,0356
-------------
58254 records
Seek time 0
Scan Rest For time 0,0986
-------------
58254 records
SQL query time 0,0429[/pre]
First, notice for all fairness I put reading the cValue column as part of the loop body code, so it's not just the record pointer moving and even in the case, Rushmore optimizes the Index expression the indexed field is read. Just like SQL does read this and copy it.
First observation: Both WHILE and SCAN ALL are obviously not optimized and take much longer. It's easy to understand that all the things SCAN does automatically even in case of not making further optimizations alrady make it faster.
Second observation: Surprise, SCAN FOR actually is comparably fast to the combination of SEEK and SCAN WHILE. Someone else predicted the SEEK/WHILE combination to be best and WHILE not being optimized, just making use of the index order, so the main act is the fast SEEK to the first result record. Well, if WHILE was really just profiting from the index order and nothing, then you'd expect about the same time as the SCAN ALL with the Exit test, which rather is the corresponding SCAN UNTIL. So first "surprise": I'd judge this as Rushmore optimization of WHILE, even though the VFP help does not specify such a thing to be done.
Next surprise is, even though I explicitly turned off the index order for the SCAN FOR test it optimizes looping all wanted records almost as good as the SEEK/WHILE combination does. The surprise is not, that Rushmore does that, it's normal that Rushmore determines a usable index, the surprise rather is that even though Rushmore is doing that over and over again with each next row it determines, it's still that fast.
The biggest surprise for some perhaps is, that SQL is not very fast, A (mainly) single field cursor with a single index is not yet a situation that profits from SQLs better Rushmore optimization, the first disadvantage is it does not only read but also writes all data into a new workarea. If you allow a filter cursor that's not the case, you get a reopened DBF with a SET FILTER, that's created faster, but it's not really creating a result, that only is created when using that filter cursor and it has many disadvantages, like having the RECCOUNT() of the underlying DBF instead of the _TALLY. Also, that speed is deceiving, as you'd have to take the time you read from a filtered cursor into consideration. Just as a USE doesn't read any data yet, it's not really executing the query. I prevented this anyway by creating an array, the picking of relevant rows is optimized, but the records determined are then read and written.
You can now add the creation of an array into any loop and make these take longer again, that's surely also a fair thing to do, but in the situation we talked about initially that array now also needs to be copied once more with ACOPY to a combobox/listbox list array.
In all fairness SQL is the winner in most queries, even still simple queries joining just two tables, the more joins and where conditions you have, the more it can profit from using multiple indexes and the more it profits from building up a bitmap of all record numbers to fetch before fetching them, while a scope condition only is optimized for fetching the next matching record, so it is applied and reapplied again and again while SQL does its Rushmore job ahead of creating the result and then has one final result.
Now, what I guessed wrong is that WHILE compares to REST FOR, but that was missing the point, that the results, in general, could differ. A WHILE scope ends with the first non-matching record, a REST FOR just starts at the current row, but still goes through to EOF, so it finds all matching records in the rest of the data. Both results here are the same, as data is sorted when using WHILE and REST FOR. But the SCAN REST FOR has to go through the majority of data following all the 'F' rows, too.
What's more important to me is to see a SCAN WHILE is not taking as long as a DO WHILE or SCAN ALL checking the condition for each record, so I guess that proves Rushmore optimization is also involved in more than the optimization of the FOR conditions/scoping, if you compare the times to unoptimized loops. The help is, as it often is, maybe not wrong, but not complete.
There's much more to learn about the mechanism of Rushmore, it's dependency on the correct supply of foreground memory with SYS(3050), for example, how it differs in all commands with scopes to only optimie getting the next record and in SQL queries, where it optimizes finding a list of all recnos to fetch. How the Grid.Optimize changes the way the grid fetches rows and why LOCATE is better than GO TOP, etc, etc.
Now I am open for discussion or code changes or further tests. I just repeat my conclusion from elsewhere: Before you start making micro-optimizations and perhaps waste more of your time than you save for all your users together. Stop thinking for cases of a handful of records. Just notice once again how I needed to produce a large DBF to even measure the differences with enough precision. You can always introduce errors or problems, so you risk something for a doubtful optimization, I'd always concentrate on problematic cases. When you think of rewriting more modern, you could even step out of the view on your code and VFP and think of a rewrite in another programming language or development stack. When you write in VFP it's okay to know when you can use name expressions instead of macro substitutions, it's okay to apply mdots and already introduce all these micro-optimizations you can always apply without needing to consider any circumstances. But it's counteracting a very valid and important principle of premature optimization and you can apply all these micro-optimizations choosing the wrong overall strategy and algorithm you can easily still have a factor of 2,10 or even more times of execution time in comparison with a cleverer solution or introducing a third party component doing some task with the embedded knowledge of an expert of fields. Besides performance, there also are topics as security, stability and maintainability. They don't always get hand in hand with best performance wise practices even in terms of readability. Therefore - just as one example - a precompiler step automatically introducing mdots to variable reads or automatically changing any assignments to STOREs would be nice, but it's hard to write such a thing for the general case never failing on edge cases.
Bye, Olaf.
Olaf Doschke Software Engineering