Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations sizbut on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

More efficient / speedier way to do this? 8

Status
Not open for further replies.

keepingbusy

Programmer
Apr 9, 2000
1,470
GB

Hi

We have a process that carries out the following:

There are two tables that store information that has been imported from txt files. One table contains current information (Today) and the other contains previous information (Yesterday).

On a daily basis, new information is imported into the current table and the current tables data is sent to yesterdays table depending on conditions.

These conditions include a match on a unique number (called an SKU number) and also whether or not the price is < the previous one.

We have some code below which does the job perfectly and when you process about 68000 records in the tables, it takes about 30 minutes.

However, when the table increase to over 200,000 we are now talking many hours to process (this is done overnight but not really the best way to do it).

I am looking for some suggestions (not the sollution) on how to speed this up.

To try and simplify things, TODAY stores the SELL price and SKU number, it then looks for it in TPREVIOUS (Yesterdays data) and if it finds a match, deletes the record. Here is what we have so far:
Code:
USE TODAY  && CURRENT / NEW LIST
GO TOP
	
DO WHILE NOT EOF()
  mrecno=0
  mrecno=RECNO()
		
  STORE " "		TO mdelete
  STORE SPACE(15)	TO mcolumn5
  STORE SPACE(16)	TO mcolumn6
  STORE SELL		TO mcolumn5
  STORE SKU		TO mcolumn6
	
* NOW START A SEARCH TO SEE IF TODAY SKU NUMBER
* AND PRICE MATCH THAT OF YESTERDAYS
* IF IT DOES, DELETE IT FROM CURRENT

  USE TPREVIOUS
  GO TOP
		
  LOCATE FOR ALLTRIM(COLUMN5)=ALLTRIM(mcolumn5) ;
    AND ALLTRIM(COLUMN6)=ALLTRIM(mcolumn6)

* IF WE FIND A MATCH IN THE CURRENT TABLE
* THAT WAS IN YESTERDAYS, DELETE IT

  IF FOUND()
    REPLACE DELETEIT WITH "Y"
    STORE "Y" TO mdelete
  ENDI
		
  CLOSE DATABASES
		
  USE TODAY SHARED
  GO mrecno
  IF mdelete="Y"
    REPLACE DELETEIT WITH "Y"
  ENDI
  SKIP
ENDDO
I would appreciate your thoughts on this.

Many thanks
 

Lee,

keepingbusy said:
The variable msearch has been explained (contains two fields in the table) but the line
Code:
IF SEEK(msearch, "TPREVIOUS", "COLUMN5" )
does this not just look for a match in field column5 or is it just the way this is written it will handle looking for a match on the two fields?

The best way to understand what a command does would be to look it up in Help. The third argument here is for index tag you are performing a search on, not a field name. You mentioned before,
keepingbusy said:
With regards to INDEX ON COLUMN5 + COLUMN6, is there supposed to be a TAG (I have added TAG COLUMN5 if this is correct)
so I used that name. Of course, it would be much less confusing if you named your tag something more self-explanatory, like TAG SELL_SKU.

But I really am a big fan of Olaf's solution. Somehow, SET RELATION types of solutions never occur to me (more often, I think in the SELECT(SQL) direction, and sometimes, SCAN), so I really appreciate this code and will keep it in mind.

Star to Olaf.
 
Hi Stella,

Well, SQL could even be faster, if it's only a few records in TODAY that are also in TPREVIOUS. The REPLACE will still make a full table scan over the TODAY table, sql might not needd that, if both tables are indexed on the join condition fields. Nevertheless rushmore will still only make use of one index for a join.

Bye, Olaf.
 

Olaf, you think SQL may be faster?
Not sure, but possible.

OK, then here is my solution:
Code:
*!* Index tags on Sell+Sku should already exist in both tables.

SELECT * ;
   FROM Today ;
   WHERE Sell+Sku NOT IN ;
        (SELECT Sell+Sku FROM TPrevious) ;
   INTO CURSOR OnlyNewData    && or INTO TABLE, if you need it

Now you don't need to delete anything at all, or even keep the DeleteIt field. You just select only new and unique data from the Today table into cursor, and you can append the cursor right to your TPrevious table.

I re-read the initial table and was horrified.
68,000 records = 30 minutes processing?
200,000 records = overnight?
I process tables with millions of records, and I never needed an overnight process. Even a 30 or 45 minutes processes are rare and far between.

Lee, you should try all of them, the SCAN code, SQL code, and SET RELATION code, and time-test them. Would you mind coming back and letting us know the results? Thanks.

Stella

 

Hi Stella

Some good suggestions on this thread. At the moment I have been using yours as posted 2 Dec 08 15:22

You mentioned:
I re-read the initial table and was horrified.
68,000 records = 30 minutes processing?
200,000 records = overnight?
I process tables with millions of records, and I never needed an overnight process. Even a 30 or 45 minutes processes are rare and far between.
Well I suppose this is down to experience and knowledge and as always, we are on that learning curve where forums like this build up that knowledge (for which I'm grateful).

I know that the regulars at Tek-Tips (You, OlafDoschke, MikeLewis and Craigber to name a few), give us less experienced Fox users that extra "Bit of code" to assist.

For the time being, we will probably run with the above as 220,000 + records are processed in under 8 minutes (much better than overnight!)

However, when time permits, we will also try out the other suggestions listed.

My thanks again to all who posted.

Lee
 

Which one ran under 8 minutes?
The SCAN loop? Not bad at all (but I am sure could be better).
 

Hi Stella, yes the scan loop.

(Not sure about the second start though)

Lee
 
Lee - if 8 minutes is good enough, great. But I would expect a process on 220,000 records to run in seconds, not minutes.

You may want to explore some of the other suggestions in this thread.

Tamar
 
Well, Lee specifically asked
I am looking for some suggestions (not the sollution) on how to speed this up.

Therefore let him go step by step. I also didn't post the relation right away, although I could have, but made several suggestions. You never learn if the solution is that far away from the initial code.

Bye, Olaf.
 

Olaf

You have made me think about this but perhaps I am going against the grain and asking for an addition to your code as I have made a slight change to what Stella posted.

There was a slight change in requirement in that when a match record was found it was deleted (that was ok) then I had to delete the existing record. So in other words:

Scan through table 1
Look for a match in table 2
Found one
Delete it
Then go back to table 1 and delete that record

That was no problem by adding and additional line:
Code:
SELECT TODAY
SCAN
  msearch = SELL + SKU
  SELECT TPREVIOUS
  SEEK(msearch)
  IF FOUND()
    REPLACE DELETEIT WITH "Y"
    SELECT TODAY
    [b]REPLACE DELETEIT WITH "Y"[/b]
  ENDIF
  SELECT TODAY
ENDSCAN
What do I need to add to your code to achieve the same result as the above code that will allow both matching records to be deleted?

My reasoning here is try the actual timing since this has been discussed a lot throughout this thread.

Code:
USE TPrevious IN 0 ORDER SKUSELL
USE Today In 0
Select Today
Set Relation TO SKU+SELL INTO TPrevious
REPLACE DELETEIT WITH "Y" FOR !EOF("TPrevious")
* or simply DELETE FOR !EOF("TPrevious") IN Today
Thank you

Lee
 
Replace can work on both aliases at the same time:
Code:
Replace TPrevious.DeleteIt With "Y", Today.DeleteIt "Y" FOR !EOF("TPrevious")
[code]

Also do SET TALK OFF before this, that can also save much time.

Bye, Olaf.
 

Hi Olaf
Code:
Replace TPrevious.DeleteIt With "Y", Today.DeleteIt "Y" FOR !EOF("TPrevious")
is showing a syntax error and as I am not familiar with this, I'm not sure what the error is.

Amongst all this, I didn't mention I'm using Version 9.

Lee

 
Sorry, a WITH is missing.

Code:
Replace TPrevious.DeleteIt With "Y", Today.DeleteIt WITH "Y" FOR !EOF("TPrevious")

 
[&nbsp;]

8 minutes sure is a LOT faster than overnight. That LOCATE statement was absolutely decimating your production time.

As you are going through the code, you might want to look for other instances of LOCATE statements used on large tables or in loops, and if you find any, rewrite the code to eliminate them.

And thanks for the star. Been a long time since I saw something that I could help with.

mmerlinn


"We've found by experience that people who are careless and sloppy writers are usually also careless and sloppy at thinking and coding. Answering questions for careless and sloppy thinkers is not rewarding." - Eric Steven Raymond
 
Hi Mike, when using REPLACE on two aliases, the IN clause makes no sense!!

Bye, Olaf.
 
when using REPLACE on two aliases, the IN clause makes no sense!!

Which is why I think no REPLACE command should operate on more than one alias at a time. Every single REPLACE should include the IN clause; it's the only way to be sure that you're doing what you think you're doing.

In this example, I'd use two REPLACE commands, one for each alias.

Tamar
 
Sorry, Tamar, I know exactly what I'm doing here, using the feature of the Replace to work on both aliases.

Code:
Select Today
Set Relation TO SKU+SELL INTO TPrevious
Replace TPrevious.DeleteIt With "Y", Today.DeleteIt WITH "Y" FOR !EOF("TPrevious")

"Today" is the active workarea, controlling via the relation whih records are selected in both Today and TPrevious. The For clause !EOF("TPrevious") makes sure I found a matching record and so I'm updating both tables DeleteIt field with "Y" at the same pass, no second Replace is needed.

I think you and Mike haven't understood the context of that Replace.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top