Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

help processing massive textfile 1

Status
Not open for further replies.

brian32

Vendor
Mar 20, 2005
35
0
0
US
Hi. I have a text file that I need help processing faster. I running XP Pro on a 3GHz processor, 1GB RAM, and a 80GB hard drive.

The script I'm using is based from the Hey Scripting Guy Archive, except I modified it a little:


Basically, the script reads each line from the SourceFile.txt. When it comes to the word "oranges" it reads every character it sees before "oranges" until it gets to the word "apples ". So if the line reads "I like apples better than oranges", the script will return "better than ".

After the script reads all the values from SourceFile.txt, it copies those values on individual lines to TargerFile.txt. I use the CountFile.txt simply to know how many values have been processed. As each value is obtained, the strIncrement value is increased by 1 and is written to CountFile.txt

The SourceFile contains over 1,000,000 lines! Yes, that's 1 million. When the script is first ran, it processed around 4000 lines per second. However, as time goes on, fewer lines are processed per second. 100,000 lines, less than 50 lines were processed per second. To give the script maximum resources, I manually set the wscript.exe process in the Task Manager to RealTime, with the Affinity set to both CPUs checked.

The script has been running for over 24 hours with no sign of completing. Any ideas on how to process this faster? Thanks.

Code:
Const ForReading = 1
Const ForWriting = 2

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\SourceFile.txt", ForReading)

'===========
'Contents for CountFile
Set objFSO2 = CreateObject("Scripting.FileSystemObject")
Set objFile2 = objFSO2.OpenTextFile("C:\CountFile.txt", ForWriting)
strIncrement = 0
'===========

Do Until objFile.AtEndOfStream
    'strData = ""
    strSearchString = objFile.ReadLine

    intStart = InStr(strSearchString, "apples ")

    If intStart <> 0 Then
        intStart = intStart + 3
        strText = Mid(strSearchString, intStart, 250)

        For i = 1 to Len(strText)
            If Mid(strText, i, 1) = " oranges" Then
                'places each entry on separate line
                strData = strData & vbCrLf
                strIncrement = strIncrement + 1
                objFile2.WriteLine strIncrement
                Exit For
            Else
                strData = strData & Mid(strText, i, 1)
            End If
        Next
    End If

Loop

objFile.Close
Set objFile = objFSO.OpenTextFile("C:\TargetFile.txt", ForWriting)

objFile.WriteLine strData
objFile.Close
objFile2.Close
 
I tried something like this with 3Mil lines.

Code:
Option Explicit
'On Error Resume Next

Dim RegEx, colMatches, objMatch, strLine, objFile, objFSO, objFile2

Const ForReading = 1, ForWriting = 2, ForAppending = 8

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("c:\temp\input.txt")
Set objFile2 = objFSO.OpenTextFile("c:\temp\output.txt", ForWriting, True)
Set RegEx = New RegExp

RegEx.Pattern = "apples\s(.*)\soranges"
RegEx.Global = True
RegEx.IgnoreCase = True
Set colMatches = RegEx.Execute(objFile.ReadAll)

For Each objMatch In colMatches
   objFile2.WriteLine objMatch.SubMatches(0)
Next

--------------------------------------------------------------------------------
dm4ever
My philosophy: K.I.S.S - Keep It Simple Stupid
 
If you insist on the line by line search, the following should run faster than your, I guess:
Do Until objFile.AtEndOfStream
strSearchString = objFile.ReadLine
intStart = InStr(strSearchString, "apples ")
If intStart <> 0 Then
intEnd = InStr(intStart, strSearchString, " oranges")
If intStart <> 0 Then
strText = Mid(strSearchString, intStart+7, intEnd-intStart-7)
strIncrement = strIncrement + 1
objFile2.WriteLine strIncrement & ") " & strText
End If
End If
Loop

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top