Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Index Suggestion

Status
Not open for further replies.

gregjohn

Technical User
Aug 13, 2001
29
US
I am not a programmer, but have had to use vfp for a while. I'm hoping someone could help me with this indexing question.

I have a table of 7,000 or so records.

I start with a for loop (for x = 1 to 12)

Within the for loop, I create the following index:
index on expected_return tag er for sector_code = x

set order to er
...other code

delete tag er
endfor

The problem is the delete tag takes a long time to delete and go to the next for loop.

I have been trying to figure out a way to "manually" create an index (within an open table)that would have a filter that could change.

For example:
order name: er
type: regular
expression: expected_return
filter: sector_code = x

Problem is since x is not defined it will not save.
Any way to do this?

I suppose since I only have 12 sectors I could create 12 index tags (1 for each sector) but I was hoping there might be another way around doing that.

Also, for a relatively small table, why does it take so long to delete the tag? I am running over a gigabit network.

Thanks
 
Its not at all clear what you're trying to accomplish here, but creating an index with a FOR clause and then deleting it is going to be a slow process.

OK, you've got a field called sector_code with the numbers 1 through 12 and you want to do something to all the records in each sector_code sequentially?


INDEX ON Sector-Code TAG Sector_Code
LOCATE
nSector = Sector_Code
SCAN
IF Sector_Code <> nSector
nSector = Sector_Code
* Do whatever you need to do when the code changes
ENDIF
* Do whatever you do in your processing loop
ENDSCAN


Also, once you've generated the index, you don't need to delete it or regenerate it.

Regards,
Jim
 
Why create and then delete tags? If you need an index, you need it. Create it and leave it there. Now you never have to create it again. Deleting a tag takes forever, so why do it. I typically create an index on every field in the table plus an index on deleted(). VFP will optimize just about anything you do with that table then.

As to including an index filter, you don't really need to. Set up an index (no filter) on Sector_Code and try this code:

lcMySectorCode = "Enter an appropriate sector code rather than this text"
SELECT * FROM MyTable WHERE Sector_Code = lcMySectorCode

See how fast it runs. The idea is to get the data quickly. Do you really care how fast you can create and delete index tags?

Ken
 

GregJohn,

You are correct in your diagnosis. X is a variable that is in scope only while you are creating the index. When you come to use the index subsequently, X will no longer be available, so the index won't work.

Having said that, I can't imagine any circumstances where you would want to create and delete an index within a loop. If you could explain exactly what you want to achieve, we might be able to suggest an alternative approach.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My sites:
Visual FoxPro (www.ml-consult.demon.co.uk)
Crystal Reports (www.ml-crystal.com)
 
Gentlemen: Thank you for your replies. Let me try to be more specific.

I index for a specific sector -- lets say all technology stocks. I want to eliminate outliers (values that are gt 100 or lt -100). Based on the number of records, I skip down a certain number to find and store an upper and lower limit for a specific field (say price/earnings). I then calculate average, standard deviation, ect for technology stocks whose price/earnings ratio is between the upper and lower limit. I then scan through and replace fields with various calculations.

I then move on to another sector (but the number of records will be different and the upper and lower limits must be recalculated).

Of course, the price/earnings is just one variable. There are many others as well. Each variable calculated goes through this process for each sector.

In case you are interested here is the code I have been using. Again, thanks for your help.


SELECT j
USE n:\research\stocks\stocksel\sector_codes
SET ORDER TO sec_code
SCAN
STORE sector_code TO sector_code_a
SELECT i
INDEX ON &var1 TAG sec_per FOR sector_code = sector_code_a descending

CALCULATE cnt() TO total_records_s FOR sector_code = sector_code_a
CALCULATE cnt() TO na_s FOR &var1 = -999.99 .and. sector_code = sector_code_a
CALCULATE cnt() TO above_100_s FOR &var1 >100 .and. sector_code = sector_code_a
CALCULATE cnt() TO below_100_s FOR &var1 <-100 .and. &var1 <> -999.99 .and. sector_code = sector_code_a
CALCULATE cnt() to number_null_s FOR ISNULL(&var1) = .T. .and. sector_code = sector_code_a
var1cnt_s = total_records_s - na_s - above_100_s - below_100_s - number_null_s

store round(.10 * var1cnt_s,0) to upper
store round(.90 * var1cnt_s,0) to lower

go top
skip (above_100_s + upper)
store &var1 to var1_upper

go top
SKIP (above_100_s + lower)
store &var1 to var1_lower

calculate avg(&var1) for &var1 <> -999.99 .and. &var1 < var1_upper .and. &var1 > var1_lower .and. sector_code = sector_code_a to var1avg_s
calculate std(&var1) for &var1 <> -999.99 .and. &var1 < var1_upper .and. &var1 > var1_lower .and. sector_code = sector_code_a to var1std_s


scan
if &var1 <> -999.99 .and. ISNULL(&var1) = .F. .and. var1std_s <> 0
replace &var2 with (&var1-var1avg_s)/var1std_s
else
replace &var2 with -999.99
endif
endscan
DELETE TAG sec_per

SELECT j
endscan
endproc



 

GregJohn,

At a quick glance, your code looks basically OK ... except for those indexes.

Just create an index on Sector_Code, and then forget it. The index will be saved to disk, and will be available whenever you need to access the table.

Then, remove all the INDEX ON and DELETE TAG commands. Everything should still work the same, but will be much faster. Try it.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My sites:
Visual FoxPro (www.ml-consult.demon.co.uk)
Crystal Reports (www.ml-crystal.com)
 
Sorry Mike, I'm not really sure I follow.

I need to index by variable (say price/earnings) but for only those records whose sector code matches sector_code_a.

Is your suggestion to create an index on sector_code for each one of the 12 sectors?

By creating an index on sector_code, I would get the right records but they would not be in descending order of the variable I am evaluating(which is important).

I suppose I could create an index for price/earnings for sector_code 1,2,...12 but that would be 12 indexes for each variable. If I have 100 variables that is a lot of indexes. Is that the best way -- to combine the variable with a specific sector code?


 
For temporary indexes that you're just going to delete, why not use stand-alone IDX files? Deleting index tags from a CDX is only going to bloat the file size and slow down the operation.

But I agree with Mike.
 

GregJohn,

As Mike noted, you don't need to create separate index tags for each of your sector codes. One per index expression/variable is enough (descending, if you need so). I assume, your variables are your fields or they are more complex expressions?

Say, you issue
INDEX ON field1 TAG field1 DESCENDING
INDEX ON field2 TAG field2 DESCENDING
INDEX ON field3 TAG field3 DESCENDING
once - and leave it at this.

Then, in your code, when you need to use one of them, just issue
SET ORDER TO tag1

As for your sector codes, you can limit your scope/filter your records by using FOR clause in other commands. Say, your command

CALCULATE cnt() TO total_records_s FOR sector_code = sector_code_a

already takes care of the limiting calculations to the sectore code you need. (And you can use in your SCAN where applicable.)
 
Thanks for your input. Stella740pl, I understand what you are suggesting but there is one more component. I have created the index files and I can calculate how many records match a certain sector code with the for command. No problem.

But at this point I still have my complete table of 7000 records, and I know that x amount of records are equal to to the sector code I am working with. The next step I am trying to do is to "skip" down a certain number of records in order to eliminate some outliers.

If I issue a skip command at this point in the pgm, it is going to skip down x records (regardless of sector_code) when I want it to skip down x records only for those records that have a specific sector code.

As far as I can tell there is no "for" to use with the skip command. Perhaps there is another command?

Thanks
 

GregJohn,

You asked: Is your suggestion to create an index on sector_code for each one of the 12 sectors?

Basically, yes. In fact, Stella gave the answer to this better than I could.

Reading back through this thread, I think you (we?) are getting too hung up on indexes. You should keep in mind that you don't have to use indexes at all to achieve you want. With a relatively small table like yours, you could remove the indexes, and your code would still run. In fact, it would probably run faster than when you are creating and deleting indexes on the fly.

However, you mentioned that you want the results to be in descending order of a particular variable. For that reason, you do need indexes, on the "variables" in question (in fact, the index would be based on a field, not a variable [in the programming sense]).

But, again, you don't need to create those indexes on the fly. Just create them once, then use SET ORDER TO to invoke the appropriate index when required.

Finally, you asked As far as I can tell there is no "for" to use with the skip command. That's true. But, when you have an index order set (via SET ORDER), SKIP will take you to the next record in index order, which I think is what you want.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My sites:
Visual FoxPro (www.ml-consult.demon.co.uk)
Crystal Reports (www.ml-crystal.com)
 

GregJohn,

As far as I can tell there is no "for" to use with the skip command. Perhaps there is another command?

Possibly.
I can suggest couple of approaches now.
Both require you to have index tag on sector_code, in addition to all the index tags I mentioned above:
INDEX ON sector_code TAG sector
This step will make these commands Rushmore optimizable.

Then you might use LOCATE FOR...CONTINUE instead of SKIP.

There is another approach (even though I don't use this one for years myself) that might work fast enough on a table of 7000 records with all the indexes - SET FILTER.

SET FILTER TO sector_code = sector_code_a
GO TOP && or another pointer-moving command
....
SET FILTER TO

This one you place where you currently create and delete index tags, and your SKIPS will move only between the records included in the filter. SET FILTER is slow one of itself, but it is Rushmore optimizable if index on its expression (sector_code in your case) exists; and your table is not that big. I expect (not sure, though) that filtering should work faster than creating and deleting index tags.

Otherwise, as Mike Lewis and Mike Yearwood suggested, you might consider different solutions, as yours is not the only one possible to get your job done. You should try SELECT-SQL way, as in Mike Yearwood's code.

 
GregJohn,

Another option would be a series of views. With indexes on sector_code plus the various fields you want to analyse, these would execute extremely quickly. The basic query to create the views would have a where clause with a sector_code parameter and an order clause on the field you want to analyze.

Regards,
Jim
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top