Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Count total number of lines in a text file 4

Status
Not open for further replies.

suoirotciv

Programmer
Dec 3, 2002
205
0
0
PH
Sorry, but i cant find a forum regarding this problem.
"How can I count the total number of lines in a text file?"

I need to read a log file, and upload it in a mdb file using visual basic. i already done this uploading codes. but my actual problem is that there are to many records in a log file that when i run my system, it seems to look hung, but actually no. so what i want to do is to add a progress bar that shows the actual progress, but i cant set the progressbar1.max property, coz i dont know the total number of lines.

can anyone help me on this

Please pardon the grammar.
Not good in english.
 
There is no way to count the number of lines in a text file without actually reading the entire file.

However, one approach that you might consider is the first get the size of the file using the GetFileSize API. Then keep a running total of each line length, and from that ratio, you can set the progress bar values appropriately.

Good Luck
--------------
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein
 
As an alternative path, search these fora for a routine which "loads" an entire file as a single string ("basGrab" something?) there are some related poste which go a step (or two) further and show how to use SPLIT to seperate delimited files into lines (records) and elements (fields). It is generally quite a bit faster to get information into an app using this than the cumbersome (and SLoooooooooooooooooW) traditional method of Line Input.






MichaelRed
m.red@att.net

Searching for employment in all the wrong places
 
Try this function...
___
Function CountofLines(Filename As String) As Integer
Dim FileNum As Integer, S As String
FileNum = FreeFile
Open Filename For Binary As #FileNum
S = Space$(LOF(FileNum))
Get #FileNum, , S
Close #FileNum
CountofLines = UBound(Split(S, vbCrLf)) + 1
End Function

Private Sub Form_Load()
MsgBox CountofLines("C:\scandisk.log")
End Sub
 
Hypetia - This is great . . . and i really wanna THANK YOU for that . . . though it took some time about 4 mins. to read a 1,252,869 lines . . . but for the meantime this will be useful for me . . . again THANK YOU.

Please pardon the grammar.
Not good in english.
 
For files that big I would recommend you read in the file
in chunks instead of all at once. Unless you have A LOT of
RAM, that could actually prove to be faster.
 
chunks . . . chunks . . . chunks . . .

Robse . . . that's new to my ear . . . what is chunks?

Please pardon the grammar.
Not good in english.
 
Reason for the long execute time is VB has to create those 1,252,869 variants when it does the split. Also loading the entire thing into memory then duplicating it....not sure that is a great idea, you are talking a 20-50 meg file? double that...... You might be alot faster if you just used the scripting FileSystemObject and TextStream and read 1 line in at a time and just count whil not EOF.
 
You're right there SemperFiDownUnda, actually it was 43,854,363 KB.

And I notice that when I run my codes, it's impossible for me to open other application coz it's using most of my resources.

I already used some FileSystemObject in my codes, but what is "TextStream"?

Please pardon the grammar.
Not good in english.
 
suoirotciv - I have to wonder if you're asking the right question. If I understand your objective correctly, then you really don't care how many lines are in the file. I think that you want to maintain a progress bar which accurately reflects the file processing activity. If that's the case, then why read the entire file simply to count the lines, and then create an array. I would suspect that your machine is thrashing itself to death given the size of the file you're processing. Hence the 4 minutes.

I would suggest that get you get the total length of the file, then keep a running total of how many bytes you've processed, and set the progress bar value based on the ratio of bytes read to total bytes. Since this is based on byte counts, this has the added advantage of not being skewed by variable line lengths (although for a log file in the sizes that you have, this is not likely to be a big issue, if at all).
Code:
iFileHand = FreeFile
Open sFileName For Binary As #iFileHand
lFileLeng = LOF(iFileHand)
Close #iFileHand

iFileHand = FreeFile
Open sFileName For Input As #iFileHand
lBytesRead = 0
Do While Not EOF(iFileHand)
   Line Input #iFileHand, sFileLine
   lBytesRead = lBytesRead + Len(sFileLine)
   nProgr = (lBytesRead / lFileSize) * 100
   prgBar.Value = Int(nProgress + 0.5)
Loop
Close #iFileHand


Good Luck
--------------
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein
 
try this code
Dim fso As Scripting.FileSystemObject
Set fso = New Scripting.FileSystemObject
Dim ts As Scripting.TextStream
Set ts = fso_OpenTextFile(Text1.Text, ForReading, False)
Dim sData As String
Dim lLC As Long
Do Until ts.AtEndOfStream
lLC = lLC + 1
sData = ts.ReadLine
Loop
ts.Close
Set ts = Nothing
Set fso = Nothing

just set a reference to the Microsoft Scripting runtime
 
CajunCenturion - this is much better, i already test it in my codes . . . and you really got my point here . . . THANK YOU very much . .

Please pardon the grammar.
Not good in english.
 
isn't it great that there are guys like you that always here to help . . . SemperFiDownUnda suggestion is a different approach but doing same thing as what i want to happen when reading a file . . . while CajunCenturion codes automatically set the my progress bar . . .

THANK YOU all

Please pardon the grammar.
Not good in english.
 
From my own experience, taking 4 minutes to read/split the (43 MByte) text file is quite long, it could be an indication of a older system, one with inadequate memory resources or Disc Storage (availabulity OR fragmentation), I have used the read / split approach on files of ~ 50MBytes and 'finished' the process in a few seconds. A further advantage of the approach is that the entire content of the file is, even after the first breakdown seperated into records. If the source file is a CSV (or any regularly delimited) file, one loop with split can further break the records into fields.

A general rule of processing is to 'observe' the requirements of each process and select ones which are 'suitable' to the task. In most cases, I/O processing is an order of magnitude SLOWER than memory processing. A lage part of the I/O processing for routines like "Line Input" is the delay in the disc finding the information Since - in the 'Line Input' method, this occurs once for each record, it usually becomes the dominant factor in the time requirements of reading a large file. Reading large amounts of 'data' into memory often results the use of virtual memory -which is actually "disc" storage, however MS / Win appears to have this facility nicely optimized and the impact on operations are significantly less than the 'Brute Force" I/O methods.

My suggestion [pcolor blue]Hypetia[/color]'s code was not intended to be used / implemented to provide information for hte progress bar, but to generally avoid the need / use of it. Adding one additional level of processing of the array of string provided by the routine provides the entire recordset broken down to the individual fields, all reaqdy for further verification and validation before adding to a recordset.

If, on the other hand, the system requires FOUR minutes to read the mere 43 MByte file, I think there are additional issues to be investigated, and suggest that a minor variation of [pcolor blue]CajunCenturion[/color]'s code is appropiate for the progress bar -while the further investigation is being conducted.

Almost done here = then I went to check he is saying 43,854,363 KB. - that's not in the range of MBytes - but GBytes! I created a 50 MBytes
string requiting (by CRUDE ateDiff Calc) ~ 1 Second. Attempts to create a "string" of 50GBytes using either String or Space Fail, so either the File size is WAY larger than advertized (by * 1000?) or the read the whole thing into the string created fo it is a NO-Go! similarly perplexing, it the 'aparent record size. Using the give (43 GBytes, and the ~ 1.2 M would result in rather HUGE records (~ 35 KBytes each or rather small ones (35 BYTES each). Of course, these would be the 'average' for a CSV file, but they would still represent some additional challenges in recordset processing.

In closing, I must admit to not knowing much about the inner working of the 'FSO' object, so I cannot objectively comment on it's relative efficiency in I/O processing. It may quite well be more appropiate than the method I proposed, on the otherhand, my experience with MS. is that every convenience comes at a price, USUALLY, at least a part of the price is in the time required to accomplish the task. FSO does add some convenience. I still do not know the 'price'. Fortunatly, I do not need to deal with 43 GByte files this week. In my present state of un-employment, I do not even need to deal with 43 MByte of any thing, and do not actually have any such file lying about to do any 'research' with, so the points are -for me- at best academic.



MichaelRed
m.red@att.net

Searching for employment in all the wrong places
 
>FSO does add some convenience. I still do not know the 'price'

The main 'price' aspects (IMHO) are:

a) it only really handles text files (although there are certain features that don't care about this)
b) as a consequence of the above limitation you can't pick and choose where to put the file pointer
 
Try this if your file is delimited by CRLF.

Open TempFile$ For Binary As #1
Get #1, , TempStr$
RecordSize = InStr(1, TempStr$, vbCrLf)
Close #1
Open TempFile$ For Binary As #1
NumOfRecords = LOF(1) \ RecordSize
Close #1
Msgbox NumOfRecords & " lines!", vbInformation

Swi
 
other than the the obvious use of constants and literals (presumably because it is " ... just for illustration ... ") -WHY- would you:

open the file,

[tab]find the record length,

close the file,

[tab]Re-OPEN it in the same MODE

to then get the file size?

and then close the file all over again?


Isn't that more-or-less (actually just MORE) of hte long way 'round ye olde barn?

MichaelRed
m.red@att.net

Searching for employment in all the wrong places
 
(Adding to MichaelRed's comment...)

Not to mention the unsupported assumption that one is dealing with fixed-width records, which not only makes the process inefficient, but would tend to yield erroneous results.
 
MichaelRed is right. My code was not intended for reading files as large as 50MB. I did not think that it would be used for that purpose. It was intended to be used for text files of small size like scandisk.log or some readme.txt.

If your only intension is to display the progress while reading the file, try CajunCenturion's approach. But there is a small bug in Canjun's code which needs to be rectified.
While reading the file, the actual number of bytes read is tracked down using:

[tt]lBytesRead = lBytesRead + Len(sFileLine)[/tt]

In fact the total number of bytes read are:

[tt]lBytesRead = lBytesRead + Len(sFileLine) + 2[/tt]

2 extra bytes for vbCrLf read at the end of each line.

When I tried this code to read a 28 MB file, the EOF was reached at 92% because the number of bytes read reported by lBytesRead was smaller than the actual.

This code should be something like this. It uses the built-in Seek function to get the total number of bytes read from the file(including vbCrLfs).
___
Option Explicit
Private Sub Form_Load()
Dim iFileHand As Integer, sFileName As String
Dim lFileLeng As Long, sFileLine As String
Dim lLastPercent As Long, lPercent As Long
Dim sTimer As Single
sTimer = Timer
iFileHand = FreeFile
sFileName = "C:\some large file.txt"
Open sFileName For Input As #iFileHand
lFileLeng = LOF(iFileHand)

Do Until EOF(iFileHand)
'read line
Line Input #iFileHand, sFileLine
'%age
lPercent = 100# * Seek(iFileHand) / lFileLeng
If lLastPercent <> lPercent Then 'only if changed
lLastPercent = lPercent
Debug.Print lPercent 'display new %
DoEvents 'refresh vb
End If
Loop
Close #iFileHand
Debug.Print Timer - sTimer
End
End Sub
___
This code took less than 10 seconds to read the same 28MB file and EOF was reached at 100%.

 
a 100 meg file on my P4 1800mhz 512meg W2k Server actually hangs reading in a file via a TS. Could be a problem with my machine.....as I don't seem to have enterprise manager for SQL server working anymore 8). My point is that if you read in a file to a buffer then you are using ~50 meg of RAM. If you then SPLIT it you are also using another 50 meg. After testing a 10 meg file both got a result in under a second....only that my method didn't actually take up 100 meg of memory.

I suggested the FSO because it was a text file that was said was being processed. The FSO is pretty efficient and easy to use.

SWI 1 problem with your solution.....if the first line is only 4 characters long and the second line is 100,000 lines long and there are only 2 lines then your solution will say there are 25,001 lines when there are only 2.

When dealing with text log files I'd make the assumtion that not ever line is the same length.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top