Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Optimising a search 1

Status
Not open for further replies.

AndrewMozley

Programmer
Oct 15, 2005
621
GB
There is a table with an index tag MyTag1, key is MyBatch.

MyTaG1 is just a regular index, although it happens that Mybatch had been unique within the table. So a program could say :

Code:
SELECT MyDBF
SET ORDER TO TAG MyTag1
SEEK Thisbatch

. . . and swiftly retrieve the desired record.

It now happens that Thisbatch may not be unique in the table. So to retrieve a particular record, the code now says :
. . .
LOCATE FOR Mybatch = Thisbatch .AND. MyCode = Thiscode

I had rather hoped that the VFP search engine would optimise this, and swiftly retrieve the desired record. But the code runs slowly, so I suspect this is not the case.

I can modify the earlier code (SEEK . . ) and loop round till I get the desired record. But is there a way of getting the VFP engine to do the work for me, using the index?

Thanks.
 
You conclude from observations, but you should just look into the help and it'll tell you LOCATE does make use of Rushmore optimizations. Rule of thumb: Anything with FOR does rushmore optimize. And Rushmore can even make use of two indexes on the Mybatch and MyCode fields. Why is it slower? Your index tags could be suboptimal or you have too few memory allocated for optimizations with SYS(3050), just to name two reasons.

There is one thing the LOCATE first needs to do, it needs to determine what indexes to use. If you SET ORDER, that is predefined and so SEEK does not need that step. Also with a SET ORDER the data gets sorted by index, a LOCATE does not sort data, it just optimizes the single LOCATE to whatever record fulfills your condition. An advantage is, you can sort data by one index and still use others for finding a certain row.

Do you have two IDXes? You should put the two indexes on the two fields in the primary CDX index file of a DBF, that's best for rushmore to discover all possible optimaztion helpers, all TAGs in there. Otherwise you'd need to SET INDEX TO listofindexfiles.

There are further things limiting the use of indexes for rushmore optimization: Their collation. If current collation is GENERAL, index tags on other collations or the MACHINE collation, which actually is no collation (as it sorts by the mere binary values of the bytes and does not eg sort á and à near a) are not used, because collation translations are not easily done on the fly and indexes work for one collation only.

Last not least: The Rushmore optimization of SQL queries rather compares to that of LOCATE than SEEK, but is superior to both. Why? Because what differs from LOCATE is, a query determines ALL rows matching the criteria, not just the first, so all optimization bitmaps done to find a match are not only built up to find one row, all resulting rows are fetched.

All three forms of SEEK, optimized LOCATE, and SQL query, have their different purpose, pros and cons and can be used. If you can do without one, it certainly rather is LOCATE than the other two, but in itself LOCATE isn't that bad to find onw row without first needing to knwo and set index. There is one further thing often overlooked: SET KEY. Compares to a very specific single table query using one index to find a group of records, eg all orderdetail rows with the same orderid foreign KEY (therefore SET KEY). And then you can also use a RELATION to find data, which compares to a live JOIN.

Your LOCATE surely could go faster, but without hands on sample data and indexes it's hard to tell what's wrong.

Bye, Olaf.
 
Hi Andrew,

Olaf has already given you the theory, so I'll try to provide you with its practical implementation.
1. You can index on more than one field, e.g. (like in your case)

Code:
INDEX ON MyBatch+MyCode TAG Batch_Code

Note that it's better to give your tags meaningful names rather than [Index]1, [Index]A ([thumbsdown]) or its likes.

2. It is not necessary to change the work area and/or set the order on to perform a search. In your case, and providing you have that "complex" [wink] index in p. 1 above, it would suffice just to have

Code:
IF SEEK(ThisBatch+ThisCode, "MyDBF", "Batch_Code")
** Your code here
ELSE
** Your other code here
ENDIF

3. The SQL SELECT statement (and I agree with Olaf 157.75% :) ) is usually the fastest way to find the needed records. Then, you can have something like

Code:
SELECT T.* FROM MyDBF T ;
   WHERE T.BathcNo = m.ThisBatch AND T.BatchCode = m.ThisCode ;
   ORDER BY BatchNo, BatchCode ;
   INTO CURSOR C_BATCH_CODE

IF _TALLY == 0 && Not found
** Some code to notify the User or to write to a process log, or whatever best suits your needs
   RETURN
ENDIF

IF _TALLY > 1
** Your code to handle this situation, e.g. notify the User and return,
** or take the 1st or the last records, etc.
ENDIF

HTH.



Regards,

Ilya
 
In short, you can use SEEK to find the first record in your that matches your criteria, then either do a SCAN REST WHILE ... or LOCATE REST FOR ...

SELECT MyDBF
SET ORDER TO TAG MyTag1
SEEK Thisbatch
LOCATE REST FOR Mybatch = Thisbatch .AND. MyCode = Thiscode

That tells VFP to start at the current record and continue the search, and the current index will already be active.
Now if it doesn't find a match, it will be slow. That's when SCAN REST WHILE Mybatch = Thisbatch may be better. When the criteria no longer matches, you would be at EOF().


-Dave Summers-
[cheers]
Even more Fox stuff at:
 
Dear Olaf. Thank you very much for your prompt reply.
The help will tell you LOCATE does make use of Rushmore optimizations

This is quite correct. There is a section about Rushmore which specifically refers to its use from the LOCATE command.

In the matter of multiple indexes, there are indeed several indexes to this table. The table however is not under my control - my program is working with a table in an existing application – difficult to change.

Your note about Collation was important. Hadn’t come across that before. However SET(“COLLATE”) returns ‘MACHINE’. Can change that if you feel it would help. Thank you also for the suggestion of SYS(3050). SYS(3050,1) return 1465MB

Out of interest, on (another) table of about 100,000 records, I ran this test :
Code:
USE xarcn.dbf ALIAS xarcn
SELECT XARCN
LOCAL lTime1, lTime2, lTime3

lTime = SECONDS()
SET ORDER TO TAG xFolio
SEEK "W15"
lTime1= SECONDS() – lTime    && 0.001 seconds

lTime = SECONDS()
LOCATE FOR Folio = "W14873" .AND. Account = "*SLC"
lTime2 = SECONDS() – lTime    && 1.556 seconds

lTime = SECONDS()
SELECT T.* FROM xarcn.dbf T ;
   WHERE T.Folio = "PC8063" AND T.Account = "*BNK" ;
   INTO CURSOR C_Folio
lTime3 = SECONDS() – lTime    && 0.080 seconds


I had certainly hoped that Rushmore would improve the search using LOCATE, but must admit defeat on this occasion.

Andrew.

DSummZZZ : Since I first wrote, I had already improved matters by doing a SEEK, then a SCAN for appropriate values, along the lines that you have suggested; and your code (LOCATE REST) improves what I had written. Thank you.

 
What collation has to be used depends on the collation of the index tags. If the indexes are in GENERAL collation, you SET COLLATE TO 'GENERAL'.

Open the table and find out the collations of the index tags via:
Code:
USE xarcn.dbf
For lnCount = 1 To  ATagInfo(laIndexTags)
   ? Textmerge("Tag: <<laIndexTags[lnCount,1]>>, Expression:<<laIndexTags[lnCount,3]>>, Collation:<<laIndexTags[lnCount,6]>>")
Endfor

It's likely the indexes on Folio and Account are not in MACHINE collation and thus not used to optimize the LOCATE. What makes me wonder though, is the query is quite fast. Maybe just because data is already cached at that stage of previously doing a SEEK and LOCATE.

Last not least: Even if you had control over the table structure and indexing, I wouldn't recommend creating composite indexes for every occasion. Every index you add permanently adds to the load of inserting and updating data and an index should be used often to justify that. And only temporarily adding an index consumes too much time itself, Rushmore seldom creates temporary indexes.

You could use SYS(3054) to see at least a bit of rushmore decisions, but not for LOCATEs, only for queries.

Bye, Olaf.
 
Thank you Olaf and TBL.

The cpcurrent() and cpdbf() are both 1252 - but worth looking in to. The collation sequence for table xarcn is also MACHINE.

You may have pointed me, Olaf, towards a misunderstanding on my part. The first field Folio, is indeed the subject of an index, but the second, Account, is not.

I had thought that (During the LOCATE) cunning old Rushmore would identify the first part of the 'Filter' (Folio) as a searchable field in the index, and then further refine the record selection by hard graft. But perhaps it is the case that Rushmore requires both parts to be searchable using the Index.

The SQL SELECT approach certainly seems to be able to optimise the search even though only one clause in the WHERE statement is searchable.

Andrew
 
Andrew,

that's a very plausible explanation for how SQL can still be fast with partial optimization, although also a LOCATE should be able to make a partial optimization (well, I recently also err'd on the assumption a buffered field value should not cause an index update).

The base procedure of Rushmore is creating a bitmap (in the literal sense of the word a map of bits) with 1 for match and 0 for non match. All the partial clauses optimized by single indexes are creating a bitmap and all these bitmaps are added bitwise via the operations used for the clauses, eg AND or OR. The resulting bitmap contains 1s at the records fulfilling the whole condition, so only those are fetched in a query. And only the first 1 bit is specifying the positioning of a LOCATE. As a side note: A costly operation can be using many indexes and creating many bitmaps leading to the final one, while only one of them, a partial optimization, already limits the result candidate records so much, that you can fetch them and check all further criteria on the data directly without peeking into further indexes. One famous example is, that an index on DELETED() can sometimes work wonder and enable a full rushmore optimization, while on other occasions the performance is also good without it, especially in a table regularly PACKed.

LOCATEs task is only finding the first 1 bit of the resulting bitmap, nevertheless I don't have good ideas how you could shortcut the computation of the single bitmaps knowing you only need the first 1 bit. As the single clauses can be ORed, a 0 you have from a previous computation step can still turn into a 1 in the end. What really is puzzling about this is, that because of that similarity, the single LOCATE should at least be as fast as a query is and perhaps faster, as it finally only needs to position on the first result record a SELECT-query would need to fetch besides all further result rows.

You could perhaps turn your test order around to see what happens, if you first do a query and then LOCATE.

Maybe I'm under a wrong impression and the optimization of FOR clauses happens totally different to the query optimization, but a developer I know just recently also said raising SYS(3050) memory buffer worked wonders for optimizing a LOCATE for him in a legacy DOS app. And this memory is used for those Rushmore bitmaps besides many other things.

Very concrete: A 100.000 rows table means 100.000 bits per condition, that's 200.000 bits for two conditions you have, meaning 25.000 Byte or less than 25KB of memory need for the Rushmore bitmaps. Nothing needing a 1.5GB buffer. There's more need for other things on the way of computing such bitmaps, but you could do the illogical thing and reduce the buffer, to see what happens, maybe shrink it to 8MB, something in that magnitude.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top