Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to reduce the processing time? 2

Status
Not open for further replies.

timmappa

Programmer
Jan 21, 2001
20
0
0
AU
Please help

I have a file with more then 1.5 million transaction (T). This need to be compare with another master (M) file and if I find a match replace the transaction (T) file field with something. ID number (numeric) is matching field in both the files. I used VFP6.0 Seek command to find the matching. This works fine, but this will take long time like 8 to 10 hours to process the file. What is the best way to do this? How can I reduce the time?

Thanks
Tim
 
Depends entirely of your actual code, hardware and set-up. Try looking at How do I use Coverage to determine the slow parts of my application? faq184-63

Brian
 
since you're using SEEK, you must already have the file indexed. Here are some ideas:

1) Make the index optimized: don't have more tags than you need, have the index key as short and simple as possible (so that building the SEEK phrase goes faster).
EG: "ID number (numeric)"... use Integer field-type instead of Numeric.

2) Use only rushmore-optimizable operations, or only use SEEK to move (you seem to be doing this, but I can't really tell...)

3) Get rid of any index on the "transaction" file: Only the "master" file (the one you're looking for matches in) needs an index.
Perhaps, load the master file into memory, and use ASCAN.
Maybe, use INDEXSEEK() instead of SEEK.
See this page for a comparison between ASCAN and SEEK: Probably, use the BinaryASCAN mentioned on:
4) Use "SET COVERAGE TO coveragefile" to determine what lines in your program are the bottleneck. Hint: This setting greatly slows down your program, and creates a HUGE log file, so you may want to only let the program run for 20-30 seconds, then stop it, so that the log file is a manageable size. This still provides a good enough profile on what is taking the time in the process. I have a small hack-together analyzer for coverage files I'll post at the end here (I found VFP's coverage analyzer to be much too slow)

5) Use "INSERT INTO"s instead of "APPEND BLANK, GATHER"s...

6) better yet (i think): build an array in memory (up to 65000 rows) then "APPEND FROM arrayvar" every now-and-then instead of millions of "INSERT INTO"s or "APPEND BLANK, GATHER"s.

To really help you optimize, I'd need to see much more detail of your algorithm... such as: What happens when each match in the master is found?


Code:
PROCEDURE LoadCoverage
LPARAMETERS pcFile, plLoadFile
if used('cCoverage') and empty(pcFile)
  ?"Using existing cCoverage Cursor"
  SELECT cCoverage
else
  CREATE CURSOR cCoverage ( time N(10,6), obj c(50), ;
                            meth c(70), line n(6,0), ;
                            file c(100), deep n(3,0), ;
                            cLine c(100) )
  INDEX ON time TAG time
  INDEX ON obj TAG obj
  INDEX ON meth TAG meth
  INDEX ON line TAG line
  INDEX ON file TAG file
  ?"Appending from "+pcFile
  APPEND FROM (pcFile) TYPE CSV
  ?"Finished Appending"
endif

* VFP8 SELECT syntax fix:
lcEng = SET("EngineBehavior")
SET ENGINEBEHAVIOR 70

SELECT sum(time) as totTime, cnt(1) as Execs, 0000.000 as AvgTime, ;
  obj, meth, line, file, cLine ;
  GROUP BY obj, meth, line, cLine ;
  ORDER BY totTime DESC ;
  FROM cCoverage ;
  INTO CURSOR cLineExecution READWRITE 
SET ENGINEBEHAVIOR &lcEng 

if plLoadFile
  INDEX ON file    TAG file
  set order to file
  lcFile = ''
  lcFileText = ''
  lnFileText = 0
  LOCAL laFileText[1], marker
  marker = 0
  SCAN FOR Line>0
    WAIT WINDOW NOWAIT TimeLeft(@marker,recno(),RecCount(),'psr')
    if empty(cLineExecution.file)
      ?"  -empty(file)"
    endif
    if NOT INLIST( lower(justext(cLineExecution.file)),'fxp','vct') 
      ?"  -not a recognized file type"
    endif
    if not file(cLineExecution.file) 
      ?"  -file doesn't exist"
    ENDIF

    IF cLineExecution.Line=0 or empty(cLineExecution.file) ;
       or NOT INLIST( lower(justext(cLineExecution.file)),'fxp','vct') ;
       or not file(cLineExecution.file) 
      ??"  A"
      LOOP
    ENDIF
    
    DO CASE 
      CASE lcFile==cLineExecution.file+ALLTRIM(obj)
        * Already loaded
      CASE LOWER(JUSTEXT(cLineExecution.file))='vct'
        if not file( forceExt(cLineExecution.file,'vcx') )
          ?"  -vcx file doesn't exist"
          LOOP
        ENDIF
        lcFile=FORCEEXT(cLineExecution.file,'vcx')
        ? "Loading "+lcFile
        SELECT methods FROM ( lcFile ) ;
          WHERE ALLTRIM(LOWER(objName))==ALLTRIM(cLineExecution.obj) ;
                AND NOT EMPTY(methods) ;
          INTO ARRAY laMethods
        USE IN ( SELECT(JUSTSTEM(lcFile)) )
        lnAt = ATC('PROCEDURE '+ALLTRIM(JUSTEXT(cLineExecution.Meth)), laMethods[1])
        IF lnAt>0
          lcFileText = SUBSTR(laMethods[1],lnAt)
          lcFileText = SUBSTR(lcFileText, AT(CHR(13)+CHR(10),lcFileText)+2 )
          lnFileText = aLines(laFileText,lcFileText)
          lcFile = cLineExecution.file+ALLTRIM(obj)
          ? "Finished Loading"
        ELSE
          lcFile = 'Not Loaded'
          ? "Couldn't find the right Method"
          LOOP
        ENDIF

        
      CASE LOWER(JUSTEXT(cLineExecution.file))='prg'
        if not file( forceExt(cLineExecution.file,'prg') )
          ?"  -prg file doesn't exist"
          LOOP
        endif
        lcFile=cLineExecution.file
        ? "Loading "+lcFile
        lcFileText = FileToStr( forceExt(lcFile,'prg') )
        lnFileText = aLines(laFileText,lcFileText)
        ? "Finished Loading"
    ENDCASE
    
    if between( cLineExecution.Line, 1, lnFileText )
      REPLACE cLine WITH laFileText[ cLineExecution.Line ]
    endif
  ENDSCAN
endif

* SELECT 0  
* USE dbf('cLineExecution') EXCL AGAIN
REPLACE ALL AvgTime WITH totTime/Execs 
INDEX ON avgTime TAG avgTime DESC
INDEX ON totTime TAG totTime DESC
RELEASE xxxCover
PUBLIC xxxCover
xxxCover = ''
BROW NORM NOWAIT NAME xxxCover
for each oCol in XXXCover.Columns
  oCol.Width = 80
endfor
 
My twopennyworth!

Are you running with 'local' data - because if it's sitting on a network that will slow it down...

Another thought, can you open the tables exclusively? that would speed things up.

HTH

Regards

Griff
Keep [Smile]ing
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top