How to determine number of rows... 3

Neil Toulouse · Jul 26, 2010

Hi guys!

I have a process that strips out data from a large text file which contains around 28 million records, each record in the text file terminated by a carriage return. I am using low-level file functions to strip out the data.

Is there a quick way of determining how many rows exactly are in the text file, so I can create a progress indicator?

Thanks
Neil

I like work. It fascinates me. I can sit and look at it for hours...

DSummZZZ · Jul 26, 2010

I reckon it depends on whether the records are all fixed length or not. If so, you could use something like:

Code:

nHandle = FOPEN('somefile.txt')
cString = FGETS(nHandle)   &&... get first line
nLastByte = FSEEK(nHandle, 0, 2)  &&... go to  eof
nFileSize = nLastByte / LEN(cString)  &&... find record count
FSEEK(nHandle, 0, 0) &&... go back to beginning

If they're not, you could use something like:

Code:

cString = FILETOSTR('somefile.txt')
nRecords = OCCURS(CHR(13))

Of course, the second method on a large file could be troublesome.

-Dave Summers-
[cheers]

Even more Fox stuff at:

http://www.davesummers.net/foxprolinks.htm

Neil Toulouse · Jul 26, 2010

Hi Dave!

Thanks for the response. Unfortuately the records are not fixed length and the FILETOSTR() blows out with a memory error.

Oh well, I will have to go without the progress bar

Neil

I like work. It fascinates me. I can sit and look at it for hours...

DSummZZZ · Jul 26, 2010

I figgered as much. When you said it was 28 million records, well, that's just too big for a string.

However, you could use the first example and modify it a bit to use bytes read rather than records...

-Dave Summers-
[cheers]

Even more Fox stuff at:

http://www.davesummers.net/foxprolinks.htm

Mike Lewis · Jul 26, 2010

Neil,

I can't think of a quick way of determining the number of lines.

But why not use the number of bytes instead? As you work through the file with your low-level functions, it should be easy to keep track of the number of bytes processsed so far, and therefore the percentage of the work done.

Just a thought.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business

Olaf Doschke · Jul 26, 2010

Hi Mike,

I was thinking the same.

Using lcLine = FGETS(nFilehandle,8192) to read in line by line you get the number of bytes read in by LEN(lcLine) of course. Add that to a byte counter and compare with the file length, which is pnSize = FSEEK(nFileHandle,0,2), if done right after FOPEN. FSEEK(nFileHandle,0,1) to the beginning of the file afterwards.

Also you can count lines/records read in so far and determine an average records/bytes, so you could then interpolate a number of total records by totalrecords = numrecordsread*totalsize/numbytesread.

Bye, Olaf.

Mike Lewis · Jul 26, 2010

Olaf,

Yes, I was thinking along exactly those lines.

Using the average line-length so far also sounds like a good possibility. The further along with the process you are, the more accurate it will become.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business

cbasoz · Jul 26, 2010

Here is a simple trick to get the rowcount fast but this would take time for millions of records too:

Code:

create cursor dummy (dummy L)
append from myText.txt type sdf
lnRows = reccount()
use

Cetin Basoz
MS Foxpro MVP, MCP

Neil Toulouse · Jul 26, 2010

Thanks guys for the further input!

Mike/Olaf - good idea! I never thought about the problem in terms of file size/bytes, that's certainly doable!

Cetin - the resultant file would be greater than the 2Gb limit (have been down this route) and the over-head in time makes it unuseable, but thanks!

Bytes it is then

I like work. It fascinates me. I can sit and look at it for hours...

cbasoz · Jul 26, 2010

Code:

Cetin - the resultant file would be greater than the 2Gb limit (have been down this route) and the over-head in time makes it unuseable, but thanks!

How come? My math wasn't that bad. As far as I know:

2 Gb = 2 * 1024 Mb = 2 * 1024 * 1024 Kb = 2 * 1024 * 1024 * 1024 bytes

That makes roughly 2 Billion bytes. To exceed that limit you need approximately 1 billion lines in your text file and I thought you were saying 28 million only. That makes less than 1/35 of 2 Gb. Now scratching my head where did I go wrong?

Cetin Basoz
MS Foxpro MVP, MCP

Mike Lewis · Jul 27, 2010

Cetin,

Your maths is correct (of course). But is it possible that the 2 GB limit would apply to the file that you are appending from? I'm not saying that's the case. I've never tested it. But I know that the 2 GB limit doesn't just apply to DBFs in VFP.

Just a thought.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business

Mike Lewis · Jul 27, 2010

No, that can't be right. If the 2 GB limit did apply to the file being appended from, he wouldn't be able to import it in the first place.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business

Neil Toulouse · Jul 27, 2010

Guys

Cetin/Mike - The text file is just under 3Gb. Mike is right - I get an error message stating "the file is too large" when I try to append from it. I assumed this was blowing the 2Gb limit when I got the error, so I apologise as that was misleading :~/

For further info, I am pulling data from various sources manipulating it with VFP then populating a SQL Server database to avoid any size limits (the resultant SQL DB is just short of 14Gb).

The 28 million record file is the biggest and therefore used as the driving force of the process. But I couldn't find/think of a simple & quick method of counting the records so I could display a progress bar. But I am going with the file size method as mentioned above

Thanks
Neil

I like work. It fascinates me. I can sit and look at it for hours...

Mike Lewis · Jul 27, 2010

Neil,

Yes, I think that's the best solution. Although I like Cetin's quick method of counting the records, it would inevitably add overhead to the process, even without the 2 GB issue. Basing your progress indicator on bytes rather than records should be faster and more accurate.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business

Olaf Doschke · Jul 27, 2010

Neil, Mike,

Cetin is doing a trick, he's appending the data to a record which only has a dumm Logic field. This would not really import the 3GB of data in the file, but it would create as much records as there are lines.

Cetin's import cursor would use 1 Byte per line from the txt file.

But it would fail on 3GB anyway, because APPEND can't handle that big files, even if the cursor wouldn't grow the read in of more than 2GB does. Also FOPEN and such low level file commands of Foxpro fail on 2GB, you'll need to use API calls to handle files larger than 2GB.

OpenFile:

http://www.news2news.com/vfp/?group=27&function=23

GetFileSize:

http://www.news2news.com/vfp/?group=-1&function=165

ReadFile:

http://www.news2news.com/vfp/?group=27&function=314

You can forget APPEND, FOPEN, FGETS, FREAD and such on files larger than 2GB.

Bye, Olaf.

Olaf Doschke · Jul 27, 2010

even if the cursor wouldn't grow the read in of more than 2GB does"

even if the cursor wouldn't grow too big and fail, the read in of more than 2GB does fail.

Bye, Olaf.

Neil Toulouse · Jul 27, 2010

Guys

Just spotted another misleading typo in my last post!! Arrgh! The text file is just under 2Gb, not 3Gb as I have written.

So to clarify the position - The low level file functions work, but the append doesn't.

sorry - it's already been a long week and it's only Tuesday

I like work. It fascinates me. I can sit and look at it for hours...

Mike Lewis · Jul 27, 2010

Olaf,

Yes, I realise Cetin's code doesn't import the whole file - although it took me a few moments to pick that up.

You said: Cetin's import cursor would use 1 Byte per line from the txt file.

In fact, it's two bytes per line, because you have to allow for the Delete flag. But your main point is still good.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro tips, advice, training, consultancy
Custom software for your business

cbasoz · Jul 27, 2010

OK then it seems what I have been doing wasn't clear to see. Anyway, with such a big file, and you are saying that you are also using SQL server, a question comes to mind. Why don't you create (if a permanent one doesn't exist already) a temp table in SQL server and do a bulk load into it using "bulk copy"? It would be fast (relative to what you ask

. Something like this:

Code:

SQLExec(m.lnHandle, ;
  "create table #myTable"+;
  " (field1 varchar(10), field2 varchar(20), ...)")

Text to m.cSQL textmerge noshow
BULK INSERT #myTable
  FROM '<< fullpath("MyBigFile.txt") >>'
      WITH (DATAFILETYPE = 'char', 
           FIELDTERMINATOR = ',', 
           ROWTERMINATOR = '\n')
EndText

SQLExec(m.lnHandle, m.cSQL)

Cetin Basoz
MS Foxpro MVP, MCP

Neil Toulouse · Jul 27, 2010

Hi Cetin!

Yes, I am beginning to think the same thing, as I ran a test of this process earlier and although I am doing some manipulation of the data, it was taking about 35 seconds to import 20,000 records. This calculated to about 12 hours for the whole process!

So it is looking like get the data in first, mess about with it afterwards

Neil

I like work. It fascinates me. I can sit and look at it for hours...

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to determine number of rows... 3

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Similar threads

Log in

Part and Inventory Search

Sponsor