Limitations of rushmore in sql style queries

steve4king · Aug 8, 2012

This is mostly an academic question as my application serves it's purpose fast enough..

I'm doing a process to kill some sensitive data that we don't need.
One of the fields I just shrink, truncating the bad data before updating.
Next:

Code:

Wait Window "This may take a few minutes - TRANLOG Step 2/5" Nowait
REPL SENT  WITH "MYNUMS#"+ALLTRIM(INVOICE),ACCOUNT  WITH 'XXXXXXXXXXXX'+RIGHT(ALLTRIM(ACCOUNT ),4), ACCT_EXP  WITH 'XXXX' FOR !(LEFT(ACCOUNT ,1)$"X*") AND !EMPTY(ACCOUNT )
					
Wait Window "This may take a few minutes - TRANLOG Step 3/5" Nowait
REPL SENT1 WITH "MYNUMS#"+ALLTRIM(INVOICE),ACCOUNT1 WITH 'XXXXXXXXXXXX'+RIGHT(ALLTRIM(ACCOUNT1),4), ACCT_EXP1 WITH 'XXXX' FOR !(LEFT(ACCOUNT1,1)$"X*") AND !EMPTY(ACCOUNT1)
					
Wait Window "This may take a few minutes - TRANLOG Step 4/5" Nowait
REPL SENT  WITH "MYNUMS#"+ALLTRIM(INVOICE),ACCOUNT  WITH IIF(!EMPTY(account ),'XXXXXXXXXXXX'+RIGHT(ACCOUNT ,4),''), ACCT_EXP  WITH IIF(!EMPTY(ACCT_EXP ),'XXXX','') FOR SENT  != "MYNUMS" and LEN(ALLTRIM(SENT ))>15
					
Wait Window "This may take a few minutes - TRANLOG Step 5/5" Nowait
REPL SENT1 WITH "MYNUMS#"+ALLTRIM(INVOICE),ACCOUNT1 WITH IIF(!EMPTY(account1),'XXXXXXXXXXXX'+RIGHT(ACCOUNT1,4),''), ACCT_EXP1 WITH IIF(!EMPTY(ACCT_EXP1),'XXXX','') FOR SENT1 != "MYNUMS" and LEN(ALLTRIM(SENT1))>15

This actually runs fairly quickly in a table with about 1mil records.

Afterwards I want to verify a few things.
I check the structure easily enough, then:

Code:

select count(*) from mytable where ((!LEFT(ACCOUNT ,1)$"X*") AND !EMPTY(ACCOUNT )) or ((!LEFT(ACCOUNT1 ,1)$"X*") AND !EMPTY(ACCOUNT1 ))

This also runs fairly quickly with my test data. But I was curious if there is a better way to form this.
(Note that there are no indexes on the table, and creating an index just for this task takes as long as the task without an index and provides negligible results.)

steve4king · Aug 8, 2012

I see that I didn't actually ask a question here haha..

The question is.. what are the limitations of Rushmore, what kinds of filters work well and which do not.

I only recently discovered that the == comparative does not play well. What other kinds of functions will slow down my application?

Olaf Doschke · Aug 8, 2012

Well, your missing the point about indexes, you build an index once and for all time, the only reason to rebuild it is, it's broken. Besides your statements are REPLACES, not SQL "style". SQL is SQL-SELECT, SQL-INSERT (as opposed to the INSERT command) and SQL-DELETE (as oppesd to DELETE FOR).

If you count the time to build an index into your measurements, it's like counting in the time to install windows into the boot time. This is just a one time effort. An index of course is updated, as the data is updated, and that takes time, as a windows boot also takes time in the analogon. But the main index generation of course scans through all data to put the index expression into the new index tag of the cdx file. And this takes lots of time, but updating and using the index to optimise query WHERE clauses or FOR clauses of xbase style commands does not take much time, in contrast it gains from rushmore optimisation.

With your kind of FOR expressions to optimize you'd need indexes on LEFT(ACCOUNT,1)$"X*", which is a less desirable index, as it's very special for this purpose. You better change your code to test FOR Account="X" OR Account="*", which makes use of VFPs way of string comparison, it will find Accounts starting with X oir *. An index on Account would optimise that. And of course the same goes for Account1.

SET ANSI OFF for that to work, and it will work, and it will work faster and it's the default for ANSI anyway.

The other main expression is EMPTY(field). If you want an optimised check for an empty char field you better test for field==space(len(field)), for example that is field==" " for char(4).

You could also index on empty(Account), but As empty() is a boolean function it only has two results and that gives a bad index tree structure. If you do a BINARY index, that's fine to resolve that unbalenced index tree problem and create a small footprint tag, but as you already would do an index on the field itself, you can reuse that for the empty test.

Bye, Olaf.

Olaf Doschke · Aug 8, 2012

You're also wrong about ==: The special meaning of it will make it only find data for field == value, if the value has the exact field length, that's the only impact of it.
An index is used for expression with the index expression on one side and operators can be = , <, > or == and also if NOT is involved, that doesn't stop rushmore from using an index and inverting the found results in the way needed.

There is a good chapter about rushmore and how it works and you don't have to guess what it does.

Bye, Olaf.

Dan Freeman · Aug 9, 2012

To add to what Olaf says about indexes, you may as well build permanent indexes that optimize your common queries because there's a dirty little secret:

If VFP needs an index and doesn't have one, IT BUILDS ONE ON THE FLY and deletes it after use. So a common query will build an internal index every time the query executes. That's wildly unnecessary overhead.

In the REPLACE statements you've posted, I'd carry an index on Left(Account,1) and use the = comparison as Olaf says.

Just as a general rule, if you want to invoke Rushmore you should avoid Trim(), Alltrim(), and most substring operations (like $) unless you have an index expression that exactly matches your operational expression.

Olaf Doschke · Aug 9, 2012

Well, an Index on LEFT(Account,1) is of course shorter. But it's not as versatile as an index on Account, the full field. The Left(Account,1) index could still also cover the empty check, if no accounts but empty accounts begin with ' ' (a single space). Then of course you can also use LEFT(Account,1)=' ' instead of Empty(Account) and make use of the same index as when looking for first char 'X' or '*'. That would indeed be sufficient. Still you will also surely profit from an index on Account, when looking up an account in other places of your code.

And there you have a main thought you should also always consider: How versatile can I make use of my indexes? It doesn't make an index much faster, if an index expression is shorter, the cdx is not read in full, not even on the tag level, but VFP navigates in an index binay tree from a root node along the expression you search to the matching index tree node(s) to lookup the recno(s) it needs for the dbf, and the index structure needs a few bytes per index tree node anyway, additional to the index expression stored.

Bye, Olaf.

steve4king · Aug 10, 2012

Thanks guys.
The other side of indexing, is it does create some overhead on inserts.
The select operation I'm referring to will happen during every upgrade, but not during normal use. If VFP will actually create a temp index when I do this.. it seems there would be no point in defining one myself. (also explains why performance comparisons were a wash)

Now coming back to Rushmore:
I came across a posting that stated '==' should be used sparingly, and that If speed is of concern, it is better to set ANSI on and just use '='. Since I didn't see anything about that in the docs I wanted to get y'all's thoughts on that one.

Thanks.

(Off topic.. how do I change my profile title on Tek-Tips?)

Dan Freeman · Aug 10, 2012

I came across a posting that stated '==' should be used sparingly, and that If speed is of concern, it is better to set ANSI on and just use '='. Since I didn't see anything about that in the docs I wanted to get y'all's thoughts on that one.

Beware of over-generalizations. You should use "==" exactly ;-) where it should be used. Neither using it nor avoiding it give you a one-size-fits-all solution.

If you're going to mess with SET ANSI, note that it has complicated and sometimes bizarre interactions in concert with other settings (SET EXACT, collating sequences, and other settings) so there isn't a one-shot answer to that either.

It boils down to your best answer being "it depends". The only way you'll know what works best for your situation is testing.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Limitations of rushmore in sql style queries

steve4king

IS-IT--Management

steve4king

IS-IT--Management

Olaf Doschke

Programmer

Olaf Doschke

Programmer

Dan Freeman

Programmer

Olaf Doschke

Programmer

steve4king

IS-IT--Management

Dan Freeman

Programmer

Similar threads

Part and Inventory Search

Sponsor