Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to use part of Concatenated Index ?

Status
Not open for further replies.

vazagothic

Programmer
May 19, 2004
32
US
Hello again,
in one my tables (1.6 million records) I have defined a concatenaded index:

INDEX ON county+owner TAG cty_owner

Now, I try to use a partial index value (just the county) and scan through the entire table looking for given county:

SET ORDER TO cty_owner
SET EXACT OFF
SCAN WHILE county+owner = "CountyNameHere"
[...] Code goes here [...]
ENDSCAN

The entire procedure took 984 seconds and found 324'000 matches.

Then I've added an additional index:

INDEX ON county TAG county

And I run the following program:

SET ORDER TO county
SCAN WHILE county = "CountyNameHere"
[...] Code goes here [...]
ENDSCAN

This procedure took ONLY 36 seconds to finish.

Now is my question:
How I can use the concatenated indexes (county+owner), without adding another index (county) to the file ?

I thought the concatenated indexes would run almost as fast as single value ones, but it was almost 30 times slower.

I believe the Rushmore optimization wasn't used at all, even after using the concatenated index structure.
 
I haven't tried this myself, but you might set the index and do a seek command, then use scan rest while to run through the rest of the records.
 
If you PADR() county in your index, I believe the 1st index would work for both routines.

Just make make sure your pointer is is the right spot (i.e. .t.) when you start a SCAN WHILE.

Brian
 
OhioBill, I've modified the code to look as follow:


INDEX ON county+owner TAG cty_owner

SET ORDER TO cty_owner

SEEK "CountyNameHere"

SET EXACT OFF
SCAN REST WHILE county+owner = "CountyNameHere"
[...] Code goes here [...]
ENDSCAN


It seems as slow as the previous one .. barely 30k matches in 1 minute (still 270k matches to go)

Baltmane

I don't actually understand what you mean:

INDEX ON PADR(county, 20)+PADR(Owner, 30)

or

should I use the PADR(county, 20) within the SCAN ?


SCAN WHILE county+owner = PADR("CountyNameHere", 20)

?
 
Simply put, your concatenated tag with county 1st should have your data ordered the same was as if it is country only, you just want to make sure that the county keeps it's right spaces so that the order isn't upset by adding other characters to the right of it.
 
Baltman,
I have tried to follow your suggestion, and this is the result:


SET ORDER TO cty_owner

lcSearch = PADR("CountyNameHere", LEN(county))

SEEK lcSearch
SET EXACT OFF
SCAN WHILE county+owner = lcSearch
[...] Code goes here [...]
ENDSCAN


Unfortunatelly this code is as slow as the two previous slow examples :(


PS
sorry for making a mistake in your name in my previous post.
 
It sounds like you're going for speed to do **code** to particular records. Maybe there's another approach...

Are you trying to update, select or identify a subset of records?

Just so I have an idea, what vintage PC, VFP and size of table and record count are we talking about?

There are different solutions for different circumstances [smile]
 
When you say "the entire procedure took 984 seconds" does this include the creation of the index? It seems to me that scanning 1.6 million records should take only a few seconds with or without optimization. Are there network considerations here?

Jim
 
Well, I try the code to be as fast as possible :)
And according to VFP documentation the partial usage of concatenated indexes should be almost as fast as using single value indexes.

I'm trying to select all records matching given county value, without the need of adding new index to the table. I thought that since I do have a concatenated index with the field I need I can just use that one.

The PC I use is: P4 1.8 GHz 256MB RAM
Version of VFP: VFP 8 SP1
Table is 494MB big and has 1'674'864 records.
 
HI vazagothic

The problem you notice is because of the size of the index and simply not because of concatenation. When the index size increases, the time taken increases. SO when you add the name which is more in length.. for every record, the two fields are concatenated and then the comparison goes on left to right. That is the cause of delay.

I suggest, you can try the following

SCAN FOR county+owner = "CountyNameHere"
[...] Code goes here [...]
ENDSCAN

in effect you are changing WHILE condition to a FOR condition. I am not sure but would be interested in knowing your results. FOR isefinetely rushmore optimized and so might bring results faster. I am not confident if WHILE is optimized that way as I think. There are reasons for that, but I cannot air my views vis a vis realities.

:)


____________________________________________
ramani - (Subramanian.G) :)
 
The index file was created manually before and the time I take into consideration is the time to execute the part of the code I've shown above.

The file is located on the hard drive (it would probably take much, much longer to run it over the network)
 
I'm trying to select all records matching given county value

How about this?

Code:
SELECT * FROM MyTable WHERE county = "CountyNameHere"

Brian
 
Ramani,
still slow :( it took 1.5 minutes to find a little bit over 50k records.

Here's the real code I use:


* First, close all open tables
CLOSE TABLES ALL

* Open the table we're using for testing
USE c:\tmp\test001.dbf

* Use the concatenated index (County + OwnerLName)
SET ORDER TO cnty_owner

* Let's start the counter
ltStart = SECONDS()

* Here we store the number of found records
lnNumber = 0

SET EXACT OFF

* Find first record matching our search value
SEEK "MINNEHAHA"

* We're scanning for MINNEHAHA (in SD)
SCAN FOR County+OwnerLName = "MINNEHAHA"
* Count found records
lnNumber = lnNumber + 1
* Show value on the screen
IF (lnNumber %250 = 0)
WAIT WINDOW ALLTRIM(STR(lnNumber)) NOWAIT
ENDIF
ENDSCAN

* Get the finish time
ltEnd = SECONDS()

* Print results
? "Time:" + ALLTRIM(STR(ltEnd - ltStart))
? "Found:" + ALLTRIM(STR(lnNumber))


That code takes over 900 seconds to finish.

The above code does not compare (in the meaning of time used) to the one below:


* First, close all open tables
CLOSE TABLES ALL

* Open the table we're using for testing
USE c:\tmp\test001.dbf

* Use the Single-Field index (County)
SET ORDER TO county

* Let's start the counter
ltStart = SECONDS()

* Here we store the number of found records
lnNumber = 0

SET EXACT OFF

* Find first record matching our search value
SEEK "MINNEHAHA"

* We're scanning for MINNEHAHA (in SD)
SCAN FOR County = "MINNEHAHA"
* Count found records
lnNumber = lnNumber + 1
* Show value on the screen
IF (lnNumber %250 = 0)
WAIT WINDOW ALLTRIM(STR(lnNumber)) NOWAIT
ENDIF
ENDSCAN

* Get the finish time
ltEnd = SECONDS()

* Print results
? "Time:" + ALLTRIM(STR(ltEnd - ltStart))
? "Found:" + ALLTRIM(STR(lnNumber))


This code takes around 30 seconds to finish.
 
Also, based on the low RAM, try restricting the memory VFP uses faq184-3504.

I'd add more RAM... the more the better. I think that's your main problem.

Brian
 
Brian,
so why does the INDEX ON county TAG county works fine ?

I know that the VFP has to perform more operations while using concatenated indexes, but I wouldn't suspect it takes that much time.

According to Special Edition Using VFP 6:

it should be just a little bit slower ... but it is not :)

And about using SELECT ..
well, it's quite useful sometimes, but it is hard to control it (you cannot actually interrupt the querry, or maybe I do not know the way :) and it's quite hard to do anything with the selected records on-the-fly. You have to process the cursor later, which adds to the time.

I do not want to dismiss it completly, but I was hoping for a neat solution with SCAN .. ENDSCAN
 
Some points on postings in this thread:

- vazgothic: In your original post, Rushmore won't be used AT ALL as it isn't used in a SCAN WHILE condition. Also later you do a SEEK PADR("CountyNameHere", LEN(County)).

- SCAN WHILE will always work while the condition exists. You should set the index, SEEK the desired record, then SCAN WHILE the condition is .T.

- I don't see how PADR on the index will help any. If anything, it will just increase the size of the index tag and complicate the key, which will actually slow things down.

- "The actual code used" includes the following:

* Find first record matching our search value
SEEK "MINNEHAHA"

* We're scanning for MINNEHAHA (in SD)
SCAN FOR County+OwnerLName = "MINNEHAHA"
* Count found records
lnNumber = lnNumber + 1
* Show value on the screen
IF (lnNumber %250 = 0)
WAIT WINDOW ALLTRIM(STR(lnNumber)) NOWAIT
ENDIF
ENDSCAN

This is a common mistake. You don't want to SEEK, then SCAN FOR. The SCAN FOR will move the record pointer back to the beginning of the table or set an index to use if Rushmore finds one that is appropriate. As I stated below, use SEEK and SCAN WHILE. In this case, SCAN WHILE County="MINNEHAHA".

- SCAN FOR is generally slower than SCAN WHILE because Rushmore has to determine if there is an index that can be used. In my testing, the technique above works faster. This is contrary to what Ramani stated.

- SELECT can be interrupted. You need to SET ESCAPE ON.

- The fastest code using SCAN would be:

USE Table ORDER TAG cnty_owner
SEEK "MINNEHAHA"
SCAN WHILE County = "MINNEHAHA"
ENDSCAN

- The tests you showed have some artificial things that will cause the tests to run slower. First, don't do any screen I/O inside the loop. Also, make sure you completly shutdown the PC between the two tests. If you run the first test, then immediately the second, there could be some caching happening that will cause the second test to be faster.



Craig Berntson
MCSD, Visual FoxPro MVP, Author, CrysDev: A Developer's Guide to Integrating Crystal Reports"
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top