Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract data from file

Status
Not open for further replies.
May 31, 2007
31
AE
Hi,

I have large text files (6mb - 10mb) that contains invoice data. need to open the text file and extract a range of pages and save it to a new file. I plan to read the text file into a array and search the array for the start and end pages that will be set using variables. The script i have so far is as follows:

Dim arrFileLines()
Dim PgStart
Dim PgEnd
PgStart = 1000
PgEnd=1023
i = 0
tmpdata = ""

Set objFile = filesys.OpenTextFile(LS_TEMPFILE, 1)
Do Until objFile.AtEndOfStream
Redim Preserve arrFileLines(i)
arrFileLines(i) = objFile.ReadLine
i = i + 1
Loop

objFile.Close

For l = Lbound(arrFileLines) to UBound(arrFileLines) Step 1

tmpdata = tmpdata & arrFileLines(l)

Next

My question how do I recreate the array with the required pages set from the PgStart /PgEnd variables ?

Also having googled on the subject I understand that I can run into performance issues on large text files. An alternative method for reading the text files would be using the OpenAsTextStream method which is quite fast. However, if I was to use the OpenAsTextStream method how would I extract the relevant pages needed.

Thanks

 
Somthing like this should work:

Code:
Option Explicit
' Declare variables
Dim arrFileLines
Dim PgStart
Dim PgEnd
Dim f1
Dim f2
Dim fso
Dim i

' Initialize variables
PgStart = 1000
PgEnd=1023
Set fso = CreateObject("scripting.FileSystemObject")
Set f1 = fso.openTextFile("C:\Test.txt",1)
Set f2 = fso.openTextFile("C:\Test2.txt",2,True)

' Split data into array
arrFileLines = Split(f1.Readall,vbcrlf)

' Since array is zero based
PgStart = PgStart - 1
PgEnd = PgEnd - 1

' Pull only the range needed
For i = PgStart To PgEnd
  f2.WriteLine arrFileLines(i)
Next

' Clean up memory space
f1.Close
f2.Close
Set fso = Nothing
Set f1 = Nothing
Set f2 = Nothing

' Prompt user of completion
Msgbox "Range Extracted!"

Swi
 
Swi,

Thanks for your help. the extraction works. however, there is a issue in the formatting of the page numbers. for example if i am extracting data from pages 1000 - 1250, there might be currency values in the data $1250.00 that can be picked up as the end page number. so i have had to enclose the page number in brackets - (1000) or 1000). how would i pull the data from the range needed with the start and end page number in this format?

thanks
 
The number used in the array above is the element number (what line# the array is currently at). So, if you know that you end page is at line 1000 then it doesn't matter if the data contains $1000.00 if I am understanding your concern properly. The PgStart and PgEnd variables in this case are just start and end element #'s.

Swi
 
AGANTHONY,

Did you get this working?

Swi
 
Hi Swi,

i have been working on the script, and have written the script as a class, so i can maybe reuse the script in future, i have most of it working. however, i encountered a problem - if i am searching for the last record i get Input past end of file due to the counter being increased by 1. Line 95 :- (EndRecord = StrEnd + 1). any advice ?

Also, do you think i will run into problems with 6-10 mb files ?

'******************************************************
Dim LinesInHeader, RecordDelimiter

LinesInHeader = 2 ' Sets the number of lines before the record number to be extracted
RecordDelimiter = "~" ' Sets the record delimiter within the data file Eg: ~1~

'---------------------------------------------------------------------------------------------


Dim objDataExt
Set objDataExt = New ReadInputFile

With objDataExt
.SetRecordInfo(strInput)
End With


Class ReadInputFile

Private mHeaderLines
Private mDelimiter
Private mStartRecord
Private mEndRecord
Private mInputFile
Private mStartline
Private mEndline
Private mInputFilePath

Private Property Let HeaderLines(strInput)
mHeaderLines = strFile
End Property

Private Property Let Delimiter(strInput)
mDelimiter = strFile
End Property

Private Property Let StartRecord(strInput)
mStartRecord = strInput
mStartRecord = RecordDelimiter & mStartRecord & RecordDelimiter
End Property

Private Property Let EndRecord(strInput)
mEndRecord = strInput
mEndRecord = RecordDelimiter & mEndRecord & RecordDelimiter
End Property

Private Property Let Startline(strInput)
mStartline = strInput
End Property

Private Property Let Endline(strInput)
mEndline = strInput
End Property

Private Property Let InputFilePath(strInput)
mInputFilePath = strInput
End Property

' User Inputs for file to extract and record start and End
'---------------------------------------------------------------------------------------------
Public Function SetRecordInfo(strInput)

' Data file select Dialogue box.
'----------------------------------------------------------------
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objDialog = Createobject("Useraccounts.Commondialog")
objDialog.Filter = "All files|*.txt"
objDialog.Filterindex = 1
intResult = objDialog.Showopen

IF(intResult = 0) Then

Wscript.quit

Else

strInput = objDialog.FileName
mInputFile = strInput
s = objFSO.GetFileName(strInput)
mInputFilePath = s


End If



' User Input box
'----------------------------------------------------------------
StrStart = InputBox(" Enter Starting Record Number","Start Record to Print")
StartRecord = StrStart

StrEnd = InputBox(" Enter End Record Number","Last Record to Print")
EndRecord = StrEnd + 1

mHeaderLines = (LinesInHeader)
mDelimiter = (RecordDelimiter)

'----------------------------------------------------------------
tmp = ""
i = 0

Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile _
(mInputFile, ForReading)

Do While Not objTextFile.AtEndOfStream

strNextLine = objTextFile.ReadLine

If InStr(strNextLine, mStartRecord) Then
strStartline = i - 1
Else

If InStr(strNextLine, mEndRecord) Then
strEndline = i - mHeaderLines
End If
End if

i = i + 1

Loop

mStartline = strStartline
mEndline = strEndline

objTextFile.close

Set objDialog = Nothing


' reopens the text file and reads the data
'*******************************************************************
Set objTextFile = objFSO.OpenTextFile _
(mInputFile, ForReading)


sEndline = False
tmp = ""
i = 1
Do While Not sEndline = True


'If lineno = 1
'***********************************************
If mStartline = 1 Then

If i = mStartline Then

Do Until mStartline = mEndline

tmp = tmp & objTextFile.readline & vbCr

mStartline = mStartline + 1

Loop

sEndline = True

End If
End If


''If lineno > 1
'***********************************************
If mStartline > 1 Then

Do Until i = mStartline
objTextFile.skipline
i = i + 1
Loop

If i = mStartline Then

Do Until mStartline = mEndline

tmp = tmp & objTextFile.readline & vbCr

mStartline = mStartline + 1

Loop

sEndline = True

End If
End If


Loop


Set objFile1 = objFSO.CreateTextFile("C:\new_file.txt", 2)
objFile1.write(tmp & VBFormFeed)
objFile1.close


Set objFSO = Nothing
End Function



End Class



my sample data file is as follows

[bank.dt]

~1~

This is a test line item for splitting files.
This is a test line item for splitting files.

[bank.dt]

~2~

This is a test line item for splitting files.
This is a test line item for splitting files.

[bank.dt]

~3~

This is a test line item for splitting files.
This is a test line item for splitting files.
 
So you are not actually looking for the line#'s that the user enters you are looking for text within the files that the user enters? You will see a pretty major performance hit due to concatenating all of the data into a string variable. Did you take a look at the method I posted above?

Swi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top