Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Get the line number of every occurrence of a string in an ASCII file.

Status
Not open for further replies.

tbtcust

Programmer
Oct 26, 2004
214
US
How do I get the line number of every occurrence of a string in an ASCII text file with minimal reads and counting each line?

I have an ASCII text file with close to 8 million records in it. I need to scan the file and provide the line number for every occurrence of a string.

Thanks in advance for any help.
 
Open text file with the FileSysteObject.

Use FSO.ReadLine to loop thrugh the file and increment a counter on each iteration

Check each line for a match, store the count somewhere.



Or use Notepad++ to save re-inventing the wheel :)


Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 

Something like this if you are looking for XYZ in your file:
Code:
Dim strTextLine As String
Dim l As Long

Open "C:\Temp\MyFile.txt" For Input As #1
Do While Not EOF(1)            
   l = l + 1
   Line Input #1, strTextLine   
   If InStr(strTextLine, "XYZ") Then
       MsgBox "XYZ is on line " & l
   End If
Loop
Close #1

Have fun.

---- Andy
 
Have you considered just using the DOS find command.

find /N "your_string" your_file


In order to understand recursion, you must first understand recursion.
 
FSO is the slowest way to read files in VB6, but if you use that you may as well use the TextStream object's Line property.

Note that it reports the next line, i.e. before first ReadLine it is 1, after first ReadLine it is 2, etc.
 
> close to 8 million records in it. I need to scan the file and provide the line number for every occurrence of a string

How much data in each record (approx)?
 
The results list could be quite long too.

Did you want the results as a file or a giant array or what?
 
You could read the entire file into a single string variable and then use Split() to divide it up into an array of strings - one per line - which you could then search for the key you are looking for. However, VB has limits to the maximum size of strings it can handle - which is why, I assume, strongm was asking about the record size.

If the record size was fixed, you could do it even faster by avoiding Split() and simply using Instr on the whole string. At each occurence found you could calculate the line number from the byte position.

The two main problems with this approach would be that, as I say, VB has limits on the size of strings it can handle (I'm not sure what the limit is - perhaps strongm can shaed some light?) and the time VB takes to handle strings seems to grow almost exponentially with string size.

Tony
 
>which is why, I assume, strongm was asking about the record siz

Um ... not exactly.

>the time VB takes to handle strings

Strings in VB are slow compared to all the other variable types. And it has anasty habit of unexpectedly creating lots and lots of temporary new strings as it goes. And that's really slow. Particualrly as the strings get bigger.

So I probably wouldn't actually use VB strings here. I suspect I'd probably treat everything as byte arrays
 
I tried this by generating an 8M line (563MB) test file with a search string inserted into random lines.

Using Jet as a search engine: I created a new database, then imported the text file adding line numbers, then did a wildcard search for the presence of the search target, returning a Recordset of the line numbers.

This took 80 seconds.

Then I "reported" the Recordset to disk and cleaned up (closed and deleted the MDB, etc.) which took another 30 seonds, giving a 70KB (8064 line) results file of line numbers.


I'm not claiming it is speedy (though not as bad as I feared) and it would have an upper limit of a search file about 3 times this size. But this is pretty versatile, and you could easily do more complex things with it.
 
I've got a byte routine that only takes a few seconds to search through an 80000000 line file on some fairly dated hardware (although the testbed assumes only about 25 bytes per line).

It isn't very versatile though
 
Yeah, if you don't need a lot of AND and OR conditions or range checks or something and have no need to transform or summarize the data a hand-coded solution is best.

If the text file is ANSI (very likely) you can code something up that reads in large blocks as a Byte array and use InStrB() on them.
Code:
Option Explicit

Private Sub Main()
    Dim Bytes(5000) As Byte
    
    Bytes(2500) = Asc("A")
    Bytes(2501) = Asc("B")
    Bytes(2502) = Asc("C")
    MsgBox InStrB(1, Bytes, [COLOR=red]StrConv([/color]"AB"[COLOR=red], vbFromUnicode)[/color]) - 1
End Sub
Don't forget to StrConv()!


Gets messy though dealing detecting the line breaks, counting line numbers, and handling lines that break over blocks.

What I like about ADO/Jet solutions is they also work in VBScript (WSH scripts) with almost zero speed penalty. Great for quick and dirty things needing no UI, etc. and where I don't have a compiler installed.
 
Just read the whole (ANSI-assumed, just like dilletante) file in as a single array of bytes. Convert your search string to a binary array (remembering VB strings are Unicode, so we need to convert). We are now no longer working with strings. Hurrah!

And all we need to do is walk though the big array looking for the small array. And increment a line counter every time we encounter the value 10 (so it'll detect eol whether for Windows/DOS or *nix text files)
 

All this nice discussion about 100's different ways of doing it, but not a word from tbtcust... :-(


Have fun.

---- Andy
 
Well, they don't seem to have logged in since they posted the question. Can't be that important ...
 
>Or were you looking for code?

Well, the method sounds relatively straightforward, but if you care to post some I always find your code very helpful.

Sorry about the lack of response until now - I was away on hols for 2 weeks.

Tony
 
Could be an interesting project to do as "proof of concept" for the construction of an inverted index in VB6.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top