More efficient / speedier way to do this? 8

keepingbusy · Dec 1, 2008

Hi

We have a process that carries out the following:

There are two tables that store information that has been imported from txt files. One table contains current information (Today) and the other contains previous information (Yesterday).

On a daily basis, new information is imported into the current table and the current tables data is sent to yesterdays table depending on conditions.

These conditions include a match on a unique number (called an SKU number) and also whether or not the price is < the previous one.

We have some code below which does the job perfectly and when you process about 68000 records in the tables, it takes about 30 minutes.

However, when the table increase to over 200,000 we are now talking many hours to process (this is done overnight but not really the best way to do it).

I am looking for some suggestions (not the sollution) on how to speed this up.

To try and simplify things, TODAY stores the SELL price and SKU number, it then looks for it in TPREVIOUS (Yesterdays data) and if it finds a match, deletes the record. Here is what we have so far:

Code:

USE TODAY  && CURRENT / NEW LIST
GO TOP
	
DO WHILE NOT EOF()
  mrecno=0
  mrecno=RECNO()
		
  STORE " "		TO mdelete
  STORE SPACE(15)	TO mcolumn5
  STORE SPACE(16)	TO mcolumn6
  STORE SELL		TO mcolumn5
  STORE SKU		TO mcolumn6
	
* NOW START A SEARCH TO SEE IF TODAY SKU NUMBER
* AND PRICE MATCH THAT OF YESTERDAYS
* IF IT DOES, DELETE IT FROM CURRENT

  USE TPREVIOUS
  GO TOP
		
  LOCATE FOR ALLTRIM(COLUMN5)=ALLTRIM(mcolumn5) ;
    AND ALLTRIM(COLUMN6)=ALLTRIM(mcolumn6)

* IF WE FIND A MATCH IN THE CURRENT TABLE
* THAT WAS IN YESTERDAYS, DELETE IT

  IF FOUND()
    REPLACE DELETEIT WITH "Y"
    STORE "Y" TO mdelete
  ENDI
		
  CLOSE DATABASES
		
  USE TODAY SHARED
  GO mrecno
  IF mdelete="Y"
    REPLACE DELETEIT WITH "Y"
  ENDI
  SKIP
ENDDO

I would appreciate your thoughts on this.

Many thanks

jimstarr · Dec 1, 2008

Very likely the bulk of the time is spent in the LOCATE statement. I'm guessing that you don't have TPREVIOUS indexed on ALLTRIM(COLUMN5)and ALLTRIM(COLUMN6). Create these indexes and you may speed things up drastically.

Jim

Olaf Doschke · Dec 1, 2008

You're opening and closing TPREVIOUS for each record you process. Open your tables ONCE outisde of the loop. Use a scanloop. And yes, use INDEXes.

Why do you set mrecno to 0 and right afterward to recno()? same goes to mcolumn5 and 6, those are useless.

You can do without ALLTRIM if column5 is c(15) and column6 is C(16) then the match will be done on the whole string, doing alltrim on both sides will just cost more time than comparing the untrimmed values, as foxpro will read and copy both values twice, once for the trim, once for the comparison.

You can and should index on the simple fields then, not on alltrim(field). Better yet index on column5+column6 then make a RELATION once outside the loop and you would need no locate, no seek, simply by scanning through TODAY the relation will automatically position on a matchin TPREVIOUS and on EOF("TPREVIOUS"), if there is no match. Don't close databases, keep tables open during the processing.

Then I don't understand why you set a column deleteit to Y in *both* TODAY and TPREVIOUS, if you find a match.

Bye, Olaf.

mmerlinn · Dec 1, 2008

[ ]
Using LOCATE on large tables is very slow primarily because every LOCATE always starts at the top of the table. Doing 200000 LOCATEs on a 200000 record table means that the average LOCATE is looking at 100000 records.

I would do something like this:

[tt][blue]
USE TODAY
USE TPREVIOUS
INDEX ON COLUMN5 + COLUMN6

SELECT TODAY && CURRENT / NEW LIST
SCAN

msearch = SELL + SKU

SELECT TPREVIOUS
IF SEEK(msearch)
REPLACE DELETEIT WITH "Y"
ENDIF

SELECT TODAY
IF FOUND()
REPLACE DELETEIT WITH "Y"
ENDIF

ENDSCAN

USE

SELECT TPREVIOUS
USE
[/blue][/tt]

Something like this should speed your processing up by several orders of magnitude.

Note that this is UNTESTED. Also note the the syntax may not be entirely correct as I have no way to check the syntax on this computer.

mmerlinn

http://mmerlinn.com

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Steven Raymond

mmerlinn · Dec 2, 2008

[ ]
After reading Olaf's comments I realized that you don't need to set DELETEIT in both tables.

Assuming that you are deleting from the TODAY table, my code can be simplified to:

[blue][tt]
USE TODAY
USE TPREVIOUS
INDEX ON COLUMN5 + COLUMN6

SELECT TODAY && CURRENT / NEW LIST
SCAN

msearch = SELL + SKU

SELECT TPREVIOUS
SEEK(msearch)

SELECT TODAY
IF FOUND()
REPLACE DELETEIT WITH "Y"
ENDIF

ENDSCAN

USE

SELECT TPREVIOUS
USE
[/tt][/blue]

mmerlinn

http://mmerlinn.com

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Steven Raymond

keepingbusy · Dec 2, 2008

Hello guys

Thank you for the very speedy responses which I will try over the next day or so and post back with my findings/results.

KB

mmerlinn · Dec 2, 2008

[ ]

LOCATE vs SEEK on a 200,000 record table:

LOCATE will look at a minimum of 1 record and a maximum of 200,000 records with the average LOCATE looking at 100,000 records.

SEEK will look at a minimum of 1 record and a maximum of 18 records with the average SEEK looking at 17 records.

Looking at EIGHTEEN records as opposed to ONE HUNDRED THOUSAND records makes SEEK by far the faster of the two.

In my view LOCATE should NEVER be used in a loop and NEVER be used on large tables.

mmerlinn

http://mmerlinn.com

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Steven Raymond

Mike Lewis · Dec 2, 2008

Mmerlinn,

A couple of minor comments on your excellent code:

- I think your IF FOUND() is misplaced. Because it has no argument, it will test the current work area, which is TODAY. But your Seek is on TPREVIOUS. You either need to SELECT TODAY after the IF FOUND(), or pass the appropriate alias to IF FOUND().

- It's worth pointing out that you don't need to do INDEX ON COLUMN5 + COLUMN6 every time you run the code. Once the index has been created, you just need to SET ORDER to the appropriate tag each time. Given the size of the table, that will save a measurable amount of time.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk

craigber · Dec 2, 2008

mmerlin,

Your assumptions regarding SEEK and LOCATE are flawed just a bit. SEEK never looks at the record. It looks at the Key. That can speed things up significantly as VFP will only need to transfer the key across the network, not the entire record.

Craig Berntson
MCSD, Visual FoxPro MVP,

http://www.craigberntson.com/blog

keepingbusy · Dec 2, 2008

Ok, I ran a few tests with the second code suggestion mmerlinn posted and have a few questions:

Can I assume that msearch=SELL+PRICE is looking for both the fields? An example of the two character fields together look something like 18.5824681325678 A

With regards to INDEX ON COLUMN5 + COLUMN6, is there supposed to be a TAG (I have added TAG COLUMN5 if this is correct)

I added a record to the table TPREVIOUS that is a duplicate from the table TODAY to see if the process finds this record. It appears it doesn't.

When you issue the command SELECT tablename, does the record pointer stay at the last point it left the table? I accept SCAN and ENDSCAN will move it, but what about the under the SEEK(msearch) command? If it doesn't find an appropriate match, will the pointer be at the end of the file?

When I moved:

Code:

SELECT TPREVIOUS
SEEK(msearch)
[b]SELECT TODAY[/b]
IF FOUND()
  REPLACE DELETEIT WITH "Y"
ENDIF

to

Code:

SELECT TPREVIOUS
SEEK(msearch)
IF FOUND()
  [b]SELECT TODAY[/b]
  REPLACE DELETEIT WITH "Y"
ENDIF

this seemed to find the duplicate record.

I would be grateful for some confirmation on the above.

Thank you all

mmerlinn · Dec 2, 2008

[ ]

Mike

This code was put together off the top of my head without any access to any documentation to verify that I got it correct.

1) In my experience FOUND() does not change when you move the table pointer nor does it need an argument. When I get to where I have documentation handy I will verify. However to be safe you could do this instead:

[tt][blue]
IF SEEK(msearch)
SELECT TODAY
REPLACE DELETEIT WITH "Y"
ENDIF

SELECT TODAY
[/blue][/tt]

Note that the second SELECT is required if the SEEK fails.

2) I have only INDEXed one time - note that the index is outside of the loop. However, I did fail to add a TAG and failded to SET ORDER.

craigber

I know. I put those numbers up as a raw comparison to show how much difference in speed there is between LOCATE and SEEK. By looking at the index there is a miniscule increase in speed over looking at the record RELATIVE to the MAJOR difference between the two. In this case I just plain did not think I should muddy the waters.

keepingbusy

1) Yes, you are looking for the concatentated string.

2) TAG mtag should have been included as well as SET ORDER TO mtag

3) The record pointer does not move unless you move it. SELECT does not move it. In this code only SCAN and SEEK() move the record pointer.

4) SEEK() moves the record pointer to the correct record OR EOF() for the CURRENT table. SEEK() does not move any other pointer. EOF() is set to .T. or .F.

5) SEEK() sets FOUND() to either .T. or .F. making an additional variable to hold the result superfluous.

6) Probably the reason the SEEK failed was that I had not SET ORDER.

7) Using TAG [red]COLUMN5[/red] will introduce errors since it is being used for two different things. Use a UNIQUE name instead like TAG mtag or whatever.

**********

ALL

This the kind of garbled code you get a 2 am in the morning when you have no way to run it nor verify it. Sorry if I confused anyone.

Also note that I use FP2.6 and not VFP, so it is possible that MS could have changed the way the commands work in later versions.

**********

I think the following will work better. I still can't verify anything though, so use with CAUTION.

[tt][blue]USE TODAY
USE TPREVIOUS
INDEX ON COLUMN5 + COLUMN6 [red]TAG mtag
SET ORDER TO mtag[/red]

SELECT TODAY && CURRENT / NEW LIST
SCAN

msearch = SELL + SKU

SELECT TPREVIOUS
SEEK(msearch)

SELECT TODAY
IF FOUND()
REPLACE DELETEIT WITH "Y"
ENDIF

ENDSCAN

USE

SELECT TPREVIOUS
USE
[/blue][/tt]

mmerlinn

http://mmerlinn.com

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Steven Raymond

Stella740pl · Dec 2, 2008

There is no need to switch work areas back and forth.

Specifying the alias should do the job without extra SELECTs
(as in SEEK(msearch, "TPREVIOUS"),
IF FOUND("TPREVIOUS"),
REPLACE DeleteIt WITH "Y" IN TODAY,
USE IN TPREVIOUS, etc.)

It probably won't be much faster, but it sure will be shorter and more readable.

Mike Lewis · Dec 2, 2008

Mmerlinn,

I appreciate that you wrote the code off the top of your head, and didn't mean it be polished or ready to run. My understanding of FOUND() differs from yours, but by all means I'll accept the results of your findings.

As far as the indexing is concerned, I didn't mean that you only need to create the index once within the code. I meant that you only need to create it once ever. Create the index once only, for example, when you create the table. The index file will remain on the disk, and will be updated as the table is updated. If you recreate it each time you run the program, you are simply overwriting the file, for no good reason.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

My Visual FoxPro site: www.ml-consult.co.uk

keepingbusy · Dec 2, 2008

Stella740pl

Thank you for the post but I'm not really sure how this would be used within the coding I have.

mmerlinn

I have managed to use some of the code you posted (thank you)

We are going to run a few more tests so I'll post back when we have a final result.

My sincerest thanks to all those who posted info on this thread.

Lee

Stella740pl · Dec 2, 2008

Hi Mike (Lewis),

While Mmerlin may be right in the way that, in practice, FOUND() would probably hold the result intact in some cases and for some time, I think, you are formally right about how it generally works in most cases, and I wouldn't rely on that empirical finding.

According to VFP help (well, VFP6, but I don't think this part changed that much),

If you omit the optional arguments, FOUND( ) returns a value indicating the success of the last CONTINUE, FIND, INDEXSEEK( ), LOCATE, or SEEK for the table open in the currently selected work area.

Underlining mine.

Stella740pl · Dec 2, 2008

Lee, something like this (not tested):

Code:

USE TODAY
USE TPREVIOUS
SET ORDER TO TAG COLUMN5 && assuming it already exist

SELECT TODAY            && CURRENT / NEW LIST
SCAN
    msearch = SELL + SKU
    IF SEEK(msearch, "TPREVIOUS", "COLUMN5" ) 
        REPLACE DeleteIt WITH "Y" IN TODAY
    ENDIF

ENDSCAN

USE IN TODAY
USE IN TPREVIOUS

Stella740pl · Dec 2, 2008

I would also replace the first three lines with this two:

Code:

USE TODAY IN 0
USE TPREVIOUS IN 0 TAG COLUMN5

mmerlinn · Dec 2, 2008

[ ]

According to what Stella posted from help, using FOUND() like I did is NOT reliable.

Not sure that I have used it in that way in any of my programming, but I will definitely be watching and if I find any occurances, I will be changing them so that reliability is NOT an issue.

mmerlinn

http://mmerlinn.com

"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Steven Raymond

keepingbusy · Dec 2, 2008

Hi Stella

Part of your suggestion reads:

Code:

msearch = SELL + SKU
IF SEEK(msearch, "TPREVIOUS", "COLUMN5" )
  REPLACE DeleteIt WITH "Y" IN TODAY
ENDIF

The variable msearch has been explained (contains two fields in the table) but the line

Code:

IF SEEK(msearch, "TPREVIOUS", "COLUMN5" )

does this not just look for a match in field column5 or is it just the way this is written it will handle looking for a match on the two fields?

Also the line:

Code:

USE TPREVIOUS IN 0 TAG COLUMN5

shows an error. Am I right in saying this should be:

Code:

USE TPREVIOUS IN 0 ORDER COLUMN5

Thank you for your post

Lee

Olaf Doschke · Dec 2, 2008

Well, it all can be boiled down to this:

Code:

USE TPrevious IN 0 ORDER SKUSELL && index on SKU+SELL TAG SKUSELL must exist
USE Today In 0
Select Today
Set Relation TO SKU+SELL INTO TPrevious
REPLACE DELETEIT WITH "Y" FOR !EOF("TPrevious")
* or simply DELETE FOR !EOF("TPrevious") IN Today

Because if the Relation will find a record in TPrevious EOF("TPrevious") will be .F. and if no corresponding record is found in TPrevious EOF() is .T.

Like I said, no LOCATE, No SEEK needed, simply a relation.

Bye, Olaf.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

More efficient / speedier way to do this? 8

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor