Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to determine number of rows... 3

Status
Not open for further replies.

Neil Toulouse

Programmer
Mar 18, 2002
882
GB
Hi guys!

I have a process that strips out data from a large text file which contains around 28 million records, each record in the text file terminated by a carriage return. I am using low-level file functions to strip out the data.

Is there a quick way of determining how many rows exactly are in the text file, so I can create a progress indicator?

Thanks
Neil

I like work. It fascinates me. I can sit and look at it for hours...
 
Hmm,
35 seconds for 20000 records. It is slow. There is something wrong there if the server itself is not slow.

Cetin Basoz
MS Foxpro MVP, MCP
 
Here's some sample code generating a random 512 MB file and rereading it in, filtering records to insert into a cursor named curImport (atatistically ~1/26 of total lines)

Code:
#Define GENERIC_ALL              0x10000000
#Define GENERIC_READ             0x80000000
#Define GENERIC_WRITE            0x40000000
#Define GENERIC_EXECUTE          0x20000000
#Define CREATE_ALWAYS                     2
#Define OPEN_EXISTING                     3
#Define FILE_ATTRIBUTE_NORMAL          0x70

#Define OF_READ                  0x00000000


DECLARE INTEGER CreateFile IN kernel32;
    STRING  lpFileName,;
    INTEGER dwDesiredAccess,;
    INTEGER dwShareMode,;
    INTEGER lpSecurityAttributes,;
    INTEGER dwCreationDisposition,;
    INTEGER dwFlagsAndAttributes,;
    INTEGER hTemplateFile

DECLARE INTEGER WriteFile IN kernel32;
    INTEGER   hFile,;
    STRING  @ lpBuffer,;
    INTEGER   nBt2Write,;
    INTEGER @ lpBtWritten,;
    INTEGER   lpOverlapped

DECLARE INTEGER GetFileSize IN kernel32;
    INTEGER   hFile,;
    INTEGER @ lpFileSizeHigh

DECLARE INTEGER ReadFile IN kernel32;
    INTEGER   hFile,;
    STRING  @ lpBuffer,;
    INTEGER   nNumberOfBytesToRead,;
    INTEGER @ lpNumberOfBytesRead,;
    INTEGER   lpOverlapped
    
DECLARE INTEGER CloseHandle IN kernel32;
    INTEGER hObject

#Define cnPageSize 8192

Local lcFilename, lnHandle, lnLen, lcLine, lnBytes, lnBytesToRead, lcBuffer, lcRest

Rand(-1) && intialise random number generator randomly

* create file
lnStartTime = Seconds()

lnBytes = 0
lcFilename = Addbs(GetEnv('TEMP'))+"largefiletest"+Sys(2015)+".txt" 
lnHandle = CreateFile(lcFilename, GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0)

Do While lnBytes < 512*1024*1024 && 3*1024*1024*1024
   lnLen = Int(Rand()*200)+200 && record length 200-400 Bytes
   lcLine = Replicate(Chr(Rand()*26+65),lnLen)+Chr(13)+Chr(10) && records consist of one char only, repeated lnLen times.
   WriteFile(lnHandle, lcLine, lnLen+2, @lnLen, 0)
   lnBytes = lnBytes + lnLen
EndDo

CloseHandle(lnHandle)
? "Time (sec) to create large file:", Seconds()-lnStartTime


* read file
lnStartTime =Seconds()

Public gnCountTotalLines
gnCountTotalLines = 0
Create Cursor curImport (mRecord M)

* As files are read in chunkwise (in 8KB chunks) this 
* means we don't necessarily read in complete lines.
* therefor in each chunk there can be a rest, which 
* would be the begin of a new line/record. This is
* stored in lcRest and initially empty:

lcRest = ""

lnHandle = CreateFile(lcFilename, GENERIC_READ, 0, 0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0)

* this was learning by doing for me:
* OpenFile is deprecated (for 16 bit OS), CreateFile is also to open existing files via OPEN_EXISTING flag.

lnBytesRead = 0
lnBytesToRead = 1

Do While lnBytesToRead > 0
   lnBytesToRead = cnPageSize
   lcBuffer = Space(lnBytesToRead)
    
   ReadFile(lnHandle,@lcBuffer,lnBytesToRead,@lnBytesToRead,0)
   
   lcRest = ExtractLines(lcRest+Left(lcBuffer,lnBytesToRead))
   lnBytesRead = lnBytesRead + lnBytesToRead
EndDo
CloseHandle(lnHandle)

? "Time (sec) to read in large file:", Seconds()-lnStartTime
? Reccount("curImport")," records imported from", gnCountTotalLines, " total records."

Erase *largefiletest*.txt

Procedure ExtractLines(tcLines)
   Local Array laLines(1)
   Local lnCount, lnLines, lnFullLines
   lnLines = ALines(laLines,tcLines)
   lnFullLines = Occurs(Chr(13),tcLines)
   gnCountTotalLines = gnCountTotalLines + lnFullLines
   
   * only extract full lines (number of chr(13) within the buffer of bytes read in
   For lnCount = 1 To lnFullLines
       If "A" $ laLines(lnCount)
          Insert into curImport values (laLines(lnCount))
       EndIf
   EndFor
   
   * return the rest, which is an incomplete line/record, to be processed with the next chunk of the file
   Return Iif(lnFullLines<lnLines,laLines(lnLines),"")
EndProc

This takes about 20 seconds to generate and another 20 to reread the file on my computer.

Bye, Olaf.
 
Also tested with 3GB, working.

It scales well, 3GB took 130 seconds to generate and to read.

Bye, Olaf.
 
Nice one Olaf!

Something else for me to play with :)

I like work. It fascinates me. I can sit and look at it for hours...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top