Rushmore and other influences on DBF read performance

Olaf Doschke · Jun 26, 2018

Since we had a side discussion about Rushmore recently, I thought I redo some tests and I found interesting results:

I tested DO WHILE vs SCAN with different scopes and therefore strategies using automatic Rushmore optimization VFP does vs direct index usage and a combination of both. I set up the task to find a split portion of data scattered in very many records (2 million). And one observation upfront: What takes longest is creating the data. Even the least performant DO WHILE loop only makes use of the order the index sets, is taking only a few seconds, and is a case that's surely not matching the case with 10 records we came from.

I guess there is something to learn and I get some assumptions confirmed, but I am happy to be criticized and corrected, no matter by whom.

I needed to use HighPerformance Counters as the Seconds() value has a precision of about 0.015 seconds, a tick frequency of the BIOS click, that's causing results to be "pixelated", you only see multiples of 1-3 ticks and that isn't making results relevant and comparable. OK, that upfront, here's the code, followed by a little explanation and interpretation of the results:

Code:

Public lcOutput
lcOutput=""
Set Textmerge On
Set Textmerge To Memvar lcOutput Additive


Create Cursor crsTest (cEmpty c(1), cValue c(10) default sys(2015))
Append Blank
For lnI = 1 to 21 && means 2^21 records
   Append From Dbf("crsTest") Fields cEmpty
EndFor 
Index On Right(cValue,1) tag xTest

LOCAL lcBuffer, lnTicksPerSecond, t0, t1, t2

DECLARE INTEGER QueryPerformanceFrequency IN kernel32 STRING @lpFrequency
DECLARE INTEGER QueryPerformanceCounter IN kernel32 STRING @lpPerformanceCount

lcBuffer = SPACE(8)

=QueryPerformanceFrequency(@lcBuffer)
lnTicksPerSecond = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32

Clear
Local lnCount, lcValue

Store 0 To lnCount
Select crsTest
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32

Locate For Right(cValue,1)="F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32

Do While Not Eof("crsTest") AND Right(cValue,1)="F"
   lcValue = cValue
   Store m.lnCount + 1 To lnCount 
   Skip 1
EndDo
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Locate time        <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Do While time      <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------

Store 0 To lnCount
Select crsTest
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Seek "F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan All
   If Right(cValue,1)="F"
      Exit
   Endif
   lcValue = cValue
   Store m.lnCount + 1 To lnCount 
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Seek time          <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Scan All time      <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------

Store 0 To lnCount
Select crsTest
Set Order To && no direct Index usage, let Rushmore find it
Locate
Select crsTest
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan For Right(cValue,1)="F"
   lcValue = cValue
   Store m.lnCount + 1 To lnCount 
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Scan For time      <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------

Store 0 To lnCount
Select crsTest
Set Order To XTEST && back to ordered cursor
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Seek "F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan While Right(cValue,1)="F"
   lcValue = cValue
   Store m.lnCount + 1 To lnCount 
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Seek time          <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Scan While time    <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------

Store 0 To lnCount
Select crsTest
Locate
=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Seek "F"
=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Scan Rest For Right(cValue,1)="F"
   lcValue = cValue
   Store m.lnCount + 1 To lnCount 
EndScan
=QueryPerformanceCounter(@lcBuffer)
t2 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<lnCount>> records
\ Seek time          <<Transform((t1-t0)/lnTicksPerSecond)>>
\ Scan Rest For time <<Transform((t2-t1)/lnTicksPerSecond)>>
\ -------------

=QueryPerformanceCounter(@lcBuffer)
t0 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
Select cValue from crsTest Where Right(cValue,1)="F" Into Array laResult

=QueryPerformanceCounter(@lcBuffer)
t1 = buf2num(SUBSTR(lcBuffer, 1,4)) + buf2num(SUBSTR(lcBuffer, 5,4)) * 2^32
\ <<Alen(laResult,1)>> records
\ SQL query time     <<Transform((t1-t0)/lnTicksPerSecond)>>

Set Textmerge To
Set Textmerge Off
_cliptext = lcOutput

FUNCTION buf2num(tcBuffer)

************************

    RETURN ASC(SUBSTR(tcBuffer, 1,1)) + ;
        ASC(SUBSTR(tcBuffer, 2,1)) * 2^8 + ;
        ASC(SUBSTR(tcBuffer, 3,1)) * 2^16 + ;
        ASC(SUBSTR(tcBuffer, 4,1)) * 2^24

By the way, thanks to Craig Boyd, I took the API calls from

https://www.sweetpotatosoftware.com/blog/index.php/2005/06/29/execution-time-for-vfp/

To explain the cEmpty field: The creation of 2^21 records takes much longer with that many APPEND BLANKs than appending the cursor to itself. But to get new SYS(2015) default values for all records you should only append other fields than cValue, so there needed to be another field in the cursor. Another solution would be first only creating empty rows and then updating them with sys(2015). In the SQL query I only select the cValue field for fairness.

Now here are results for my circa 4-year-old and even then only above average but not top notch PC with an Intel i7 CPU:
[pre] 58254 records
Locate time 0,0396
Do While time 2,6528
-------------
873818 records
Seek time 0
Scan All time 0,4775
-------------
58254 records
Scan For time 0,0328
-------------
58254 records
Seek time 0
Scan While time 0,0356
-------------
58254 records
Seek time 0
Scan Rest For time 0,0986
-------------
58254 records
SQL query time 0,0429[/pre]

First, notice for all fairness I put reading the cValue column as part of the loop body code, so it's not just the record pointer moving and even in the case, Rushmore optimizes the Index expression the indexed field is read. Just like SQL does read this and copy it.

First observation: Both WHILE and SCAN ALL are obviously not optimized and take much longer. It's easy to understand that all the things SCAN does automatically even in case of not making further optimizations alrady make it faster.

Second observation: Surprise, SCAN FOR actually is comparably fast to the combination of SEEK and SCAN WHILE. Someone else predicted the SEEK/WHILE combination to be best and WHILE not being optimized, just making use of the index order, so the main act is the fast SEEK to the first result record. Well, if WHILE was really just profiting from the index order and nothing, then you'd expect about the same time as the SCAN ALL with the Exit test, which rather is the corresponding SCAN UNTIL. So first "surprise": I'd judge this as Rushmore optimization of WHILE, even though the VFP help does not specify such a thing to be done.

Next surprise is, even though I explicitly turned off the index order for the SCAN FOR test it optimizes looping all wanted records almost as good as the SEEK/WHILE combination does. The surprise is not, that Rushmore does that, it's normal that Rushmore determines a usable index, the surprise rather is that even though Rushmore is doing that over and over again with each next row it determines, it's still that fast.

The biggest surprise for some perhaps is, that SQL is not very fast, A (mainly) single field cursor with a single index is not yet a situation that profits from SQLs better Rushmore optimization, the first disadvantage is it does not only read but also writes all data into a new workarea. If you allow a filter cursor that's not the case, you get a reopened DBF with a SET FILTER, that's created faster, but it's not really creating a result, that only is created when using that filter cursor and it has many disadvantages, like having the RECCOUNT() of the underlying DBF instead of the _TALLY. Also, that speed is deceiving, as you'd have to take the time you read from a filtered cursor into consideration. Just as a USE doesn't read any data yet, it's not really executing the query. I prevented this anyway by creating an array, the picking of relevant rows is optimized, but the records determined are then read and written.

You can now add the creation of an array into any loop and make these take longer again, that's surely also a fair thing to do, but in the situation we talked about initially that array now also needs to be copied once more with ACOPY to a combobox/listbox list array.

In all fairness SQL is the winner in most queries, even still simple queries joining just two tables, the more joins and where conditions you have, the more it can profit from using multiple indexes and the more it profits from building up a bitmap of all record numbers to fetch before fetching them, while a scope condition only is optimized for fetching the next matching record, so it is applied and reapplied again and again while SQL does its Rushmore job ahead of creating the result and then has one final result.

Now, what I guessed wrong is that WHILE compares to REST FOR, but that was missing the point, that the results, in general, could differ. A WHILE scope ends with the first non-matching record, a REST FOR just starts at the current row, but still goes through to EOF, so it finds all matching records in the rest of the data. Both results here are the same, as data is sorted when using WHILE and REST FOR. But the SCAN REST FOR has to go through the majority of data following all the 'F' rows, too.

What's more important to me is to see a SCAN WHILE is not taking as long as a DO WHILE or SCAN ALL checking the condition for each record, so I guess that proves Rushmore optimization is also involved in more than the optimization of the FOR conditions/scoping, if you compare the times to unoptimized loops. The help is, as it often is, maybe not wrong, but not complete.

There's much more to learn about the mechanism of Rushmore, it's dependency on the correct supply of foreground memory with SYS(3050), for example, how it differs in all commands with scopes to only optimie getting the next record and in SQL queries, where it optimizes finding a list of all recnos to fetch. How the Grid.Optimize changes the way the grid fetches rows and why LOCATE is better than GO TOP, etc, etc.

Now I am open for discussion or code changes or further tests. I just repeat my conclusion from elsewhere: Before you start making micro-optimizations and perhaps waste more of your time than you save for all your users together. Stop thinking for cases of a handful of records. Just notice once again how I needed to produce a large DBF to even measure the differences with enough precision. You can always introduce errors or problems, so you risk something for a doubtful optimization, I'd always concentrate on problematic cases. When you think of rewriting more modern, you could even step out of the view on your code and VFP and think of a rewrite in another programming language or development stack. When you write in VFP it's okay to know when you can use name expressions instead of macro substitutions, it's okay to apply mdots and already introduce all these micro-optimizations you can always apply without needing to consider any circumstances. But it's counteracting a very valid and important principle of premature optimization and you can apply all these micro-optimizations choosing the wrong overall strategy and algorithm you can easily still have a factor of 2,10 or even more times of execution time in comparison with a cleverer solution or introducing a third party component doing some task with the embedded knowledge of an expert of fields. Besides performance, there also are topics as security, stability and maintainability. They don't always get hand in hand with best performance wise practices even in terms of readability. Therefore - just as one example - a precompiler step automatically introducing mdots to variable reads or automatically changing any assignments to STOREs would be nice, but it's hard to write such a thing for the general case never failing on edge cases.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Jul 5, 2018

Well, I get a different interpretation for WHILE and argued that, no need to repeat. SCAN FOR surely is NOT building a bitmap, but only optimizes finding the next match. Therefore SQL rushmore optimization is typically better. Another hint for you: In a table with millions of records LOCATE FOR wouldn't be comparably fast to SEEKs when the right index exists, if it would need to build up a bitmap with all the records it would eventually visit with CONTINUE. It does not.

You can test this very easily, change data during a SCAN FOR and you'll see the scanning follows the data, while a bitmapped based scanning would not. Bitmaps are only built in Rushmore for SQL. I'd have to look and see whether LOCATE FOR does that because of CONTINUE, but I guess it also doesn't build bitmaps, every CONTINUE causes the next match finding to be optimized again.

The help isn't helpful in that regard, because it doesn't tell all the details. You gotta investigate and make your conclusions from observations.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Jul 6, 2018

Besides, I didn't make the test suite including the combo box and already told why not.

This test does perfectly show how neglectable that is to optimize for populating the combo box list array with ACOPY or single AddItem() calls or a cursor as controlsource, but let me add a further simple reason how micro-optimizations are just wasted in this case: Combobox items never scale up. And if they do, you made a UI deign flaw. You never let a user pick from a combo box of a million customers, they will either have a ticket from a customer that makes picking a customer obsolete or they will have a search form to find a customer by attributes. If the number of elements listed in a UI scales up the larger your database becomes, you did something wrong. So the rule of thumb is: Optimizing to populate UI with mass data is something I'd not only put into the pot of premature optimization, it's quite useless. I see some exception: If you optimize your search and display of search results in a mass data capable control like a grid, you can't prevent ignorant users to not filter enough and get a huge result set, then you don't punish them with slow queries, this fires back to you. Of course, you optimize their experience. But if results grow out and you don't come to a conclusion with customers about automatic filtering/limits like age of the data, or consider paging partial results, that's insane.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Olaf Doschke · Jul 6, 2018

Edit:
Perhaps I should add up front: Take it easy, Mike Yearwood. I actually signal to everyone including you, that you could also do things less perfectly and still gain from that, mainly the time you don't need to invest into micro-optimizations that are unnecessary and never will get a grip even when data or number of users scale up. It's important to focus on the really important things. Otherwise, why don't you do Assembly? I actually saw someone recently proposing every developer should do that, almost stating high-level languages are unnecessary.

It's good to know all of the optimum ways to program, but there's no reason to change the sins of your youth as long as everything runs fine. You concentrate on the important things because there always will be other things you can improve more. Don't improve one thing by 100%, improve 100 things by 1%, it's easier and helps more in the net effect, because of the 80/20 or even 90/10 rule: If you need 80% or 90% to gain the last 20% or 10% of performance or anything else you aim for, it is inversely easy to even improve 1000 things by 1-10% in the same time and gain much more from that.

There are things you can do the best way in the first place, but then the performance is only one aspect. For reusability, you design libraries that typically will make use of some patterns like decoupling that turn performance down a bit in favor of easily swapping some layer etc. You make the impression of being very monothematic and always chime in about specifically that topic only. As Christof Wollenhaupt once said: The big success of DOS Fox applications even today is, that they simply work and the hardware acceleration of CPUs has improved them, once the DZPATCH is applied, without doing anything to the code to improve it. I still saw legacy POS systems in the Chech Republic in 2008 and I guess they are still in use. That unfortunately also contributed to fewer upgrades and shoveled the grave for VFP, but besides all the developer virtues put into acronyms, there is one on top: Get it out! Get it finished, first create a minimum viable product and not the perfect thing, you never finish the perfect thing, because there always is room for improvement.
:tidE

I think we stay with differing opinions in this case. I know a sample case I can revisit, where a bitmap took so much memory, that a developer in the IT Department failed with Rushmore optimization, as his SYS(3050) memory settings were too low - he thought 1MB would suffice. Adding another MB helped wonders and is still far below what I typically set. You can't convince everyone from having arrived in the GB era. But even when this memory - which also is used to build and store Rushmore bitmaps - is too low, then still LOCATE FOR optimizes finding next row. This is NOT done using the bitmap mechanism, simply because the goal of a LOCATE isn't to build up a result set, it and the CONTINUE command are only finding the next row.

Just tell me, what happens, if after locating to the first result record of a LOCATE you change the second result record from another VFP session and then do CONTINUE in the first VFP session still having its bitmap (as you think). It's not locating to the now changed second record, it locates the next current correct match - simply because it is not] looking up the records from a Rushmore bitmap. It should fail and locate to the now wrong record, if what you say is right, or you believe in something keeping the bitmap up to date when it detects changes, keeps track of changes, or even CONTINUE rebuilds it every time, just to evaluate the first set bit = matching record in it. No, that's not what happens, that only happens in SQL Rushmore optimization, as SQL does build a full result set.

You can really simply set up corresponding examples and then probe them, it's really simple to challenge what the help says and prove it wrong or add to it. But if you say the help has the final and only truth, then stay with what you know from that.

I don't just have my knowledge from the VFP help, which by the way often was blamed to state wrong or incomplete things - for example by Jürgen Wondzinski aka wOOdy. The MS documentation team always was separate and so the VFP development team had no direct influence on the help, especially also the translations, which even had more misunderstandings. I think the VFPX community maintained help is already better in some aspects.

Bye, Olaf.

Olaf Doschke Software Engineering

https://www.doschke.name

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Rushmore and other influences on DBF read performance

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Similar threads

Part and Inventory Search

Sponsor