Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parse very large text document, VB6, EOF error, parsing error

Status
Not open for further replies.

arniec

Programmer
Jul 22, 2003
49
US
I have a tab delimited text file pulled from an AS400 system which contains over 350 columns of data. What I need to do is go in and replace certain columns of data with placeholder data (data scrubbing). I have been working on this for several days and keep hitting my head against a wall.

My first problem consists of the files the program must use. They are generated by the AS400 and downloaded directly from that server. Every time I try to read one in using my current code I get an error message “Input past end of file.” Yet if I edit the file in Textpad, press Enter at the bottom and save it, the data will parse, somewhat.

My next problem is, due to the length of each line in the file, the data seems to not be processing correctly. I had thought that each line, separated by vbCrLf, would be considered a single record, but it seems that my lines are too long as my array into which I am parsing the data has an inconstant number of fields in each element and the data is not in a consistent order, which is of course impossible to use.

Here is the code I am running to input and parse the file (it works for smaller files saved in Notepad, but not larger files nor AS400 downloaded files):

Code:
Private Sub cmdProcess_Click()
 '   On Error GoTo Error_Handler
    
    If error_check_input = False Then Exit Sub
    
    Dim delim As String
    Select Case cmbDelim
    Case "<Tab>"
        delim = vbTab
    Case Else
        delim = cmbDelim
    End Select
    
    Dim lines() As String, i As Long
    lines() = Split(ReadTextFileContents(txtInput), vbCrLf)
    
    'to quickly delete all empty lines, load them with a special char
    For i = 1 To UBound(lines)
        If Len(Trim(lines(i))) = 0 Then lines(i) = vbNullChar
    Next
    
    'then use the Filter function to delete these lines
    lines() = Filter(lines(), vbNullChar, False)
    
    'create a string array out of each line of text and store it in
    'a Variant element
    ReDim values(0 To UBound(lines)) As Variant
    For i = 0 To UBound(lines)
        values(i) = Split(lines(i), delim)
    Next i

    ExportDelimitedFile values(), txtOutput
    
Error_Handler:
    If Err Then Err.Raise Err.Number, , Err.Description

End Sub

Function ReadTextFileContents(filename As String) As String
    Dim fnum As Integer, isOpen As Boolean
    On Error GoTo Error_Handler
    ' Get the next free file number
    fnum = FreeFile()
    Open filename For Append As #fnum
    Print #fnum,
    Close #fnum
    Open filename For Input As #fnum
    'If execution flow got here, the file has been opened without error
    isOpen = True
    ' Read the entire contents in one single operation.
    ReadTextFileContents = Input(LOF(fnum), fnum)
    'Intentionally flow into the error handler to close the file
    
    
Error_Handler:
    'Raise the error (if any) but first close the file.
    If isOpen Then Close #fnum
    If Err Then
        If Err.Number <> 62 Then Err.Raise Err.Number, , Err.Description
    End If
End Function

Sub ExportDelimitedFile(values() As Variant, filename As String, _
Optional delimiter As String = vbTab)
    Dim i As Long
    'Rebuild the individual lines of text of the file
    ReDim lines(0 To UBound(values)) As String
    For i = 0 To UBound(values)
        
        lines(i) = Join(values(i), delimiter)
    Next
    'Create CRLFs among records and write them.
    WriteTextFileContents Join(lines, vbCrLf), filename
End Sub

Sub WriteTextFileContents(text As String, filename As String, _
    Optional AppendMode As Boolean)
    
    Dim fnum As Integer, isOpen As Boolean
    On Error GoTo Error_Handler
    'get the next free file number
    fnum = FreeFile()
    If AppendMode Then
        Open filename For Append As #fnum
    Else
        Open filename For Output As #fnum
    End If
    'If execution flow gets here the file was opened correctly
    isOpen = True
    'Print to the file in one single operation.
    Print #fnum, text
    'Intentionally flow into the error handler to close the file
Error_Handler:
    If isOpen Then Close #fnum
    If Err Then Err.Raise Err.Number, , Err.Description
    
    
End Sub

If the above cannot be made to work, is there a method for parsing a file character by character in VB (I know how to do it in Java and Perl, but not VB). I was thinking I could read in each character, if it’s a tab (or whatever delimiter) I tick off a counter, and as I read in the file character by character I write it to the output file character by character, except when I get to the proper numbered column in which I will output the mask data.

Any help will be greatly, greatly, greatly appreciated as hitting my head against my keyboard has ceased being fun.
 
Is there a reason you're using the comma when you Print to teh file?

Function ReadTextFileContents(filename As String) As String

Dim fnum As Integer, isOpen As Boolean
On Error GoTo Error_Handler
' Get the next free file number
fnum = FreeFile()
Open filename For Append As #fnum
Print #fnum[red],[/red]
Close #fnum

If you're going to read it all in as one string, try opening it For Binary rather than For Input.

Lee
 
May be I'm misunderstanding what you are trying to do but if you just want to delete empty lines why noy use;

f2 = FreeFile
Open TempFile$ For Output As f2 'or Append if you like
f = FreeFile
Open FileName$ For Input As f Len = 1024
Line Input #f, a$
If Len(a$) Then Print #f2, a$
Close f, f2

Kill FileName$
Name TempFile$ As FileName$

The default for the f Len = is either 128 or 256 (I cant remember) unless specified, I should try a value which exceeds the maximum data file line length.

regards Hugh
 
arniec, according to my experience, you get an 'Input past end of file error' when you try to read a binary file using text mode (Input mode).

Try opening your file in a hex editor and make sure that your file does not contain null character. Usually, when we open a binary file in notepad and do a 'Save', all null characters are replaced with white-space characters making it acceptable for subsequent text-mode input. If you find null characters in your file, try opening the file in binary mode instead of text (input) mode.

If you make sure that your file does not contain null characters and still a valid text file than look at the character which is used to separate two consecutive lines. Normally in Windows, lines are separated using the two characters, vbCrLf (or vbNewLine). On the other hand, on non-Windows operating system (like Linux), lines in a text file are separated using a single character which is vbLf.
Unfortunately, If you parse and split such a file using vbCrLf, you will not get the lines split in the right way.

Moreover, VB's Line Input statement also does not recognize the occurence vbLf as a new line characters. It only recognizes a new line if it finds a vbCrLf combo or vbCr character alone.

So check what character is used as the new line character in your raw text file downloaded from the remote system. If you see vbLf (or vbCr) alone instead of vbCrLf, then you may need to modify the Split function call accordingly.
 
Hypetia,

THANK YOU. Your information set me on the right path. The data I was using was actually from several different systems, some of which did indeed have null characters, others of which did indeed not use vbCrLF but rather vbCr.

I ended up flushing every line of code I posted above. My research had also lead me to the FileSystemObject information, which enabled me to do a character by character parse. Then I was able to look at each character individually. If it was my delimeter I knew I was in a new column. I also looked for vbCrLf, vbCr, or vbLf to know when I had reached the end of a "row". And I just took it in character by character, be that character null, vbCr, etc. and processed output character by character.

The code is too specific to my process to post in full (it would help no one) but here is the start so you can see what I was doing:

Code:
'Set up our variables
Dim delimCount As Long              'how many delims have we seen this line
Dim i As Integer                    'our loop counter
Dim nextChar As String              'placeholder string for the next character in the file
Dim tempchar As String

'our File System Object variables
Dim fso As New FileSystemObject
Dim InputFile As File
Dim OutputFile As File
Dim tsInput As TextStream
Dim tsOutput As TextStream

'Set up our input file
Set InputFile = fso.GetFile(txtInput)
Set tsInput = InputFile.OpenAsTextStream(ForReading)

'Create and set up our output file
fso.CreateTextFile txtOutput
Set OutputFile = fso.GetFile(txtOutput)
Set tsOutput = OutputFile.OpenAsTextStream(ForWriting)

'If there is a header row (or rows) then read that row
If StartRow > 0 Then
    For i = 1 To StartRow
        'read in the first line, write it to new file
        While Not tsInput.AtEndOfLine
            nextChar = tsInput.Read(1)
            tsOutput.Write nextChar
        Wend
        'then read and write the Carriage Return and Line Feed characters
        If Not tsInput.AtEndOfStream Then
            nextChar = tsInput.Read(1)
            tsOutput.Write nextChar
        End If
    Next i
End If

nextChar = ""

delimCount = 0
While Not tsInput.AtEndOfStream
    Select Case delimCount

           DO STUFFS BASED ON WHICH COLUMN WE ARE IN

End Select
    'we are not at any of the above columns, so read the next character.  If it's
    'a delimeter we
    If nextChar = delim Then
        delimCount = delimCount + 1
         nextChar = ""
    Else
        nextChar = ""
        While nextChar <> vbCr And nextChar <> vbLf And nextChar <> vbCrLf And nextChar <> delim And Not tsInput.AtEndOfStream
            nextChar = tsInput.Read(1)
            tsOutput.Write nextChar
        Wend
        If nextChar = delim Then
            delimCount = delimCount + 1
            nextChar = ""
        ElseIf (nextChar = vbCr Or nextChar = vbLf Or nextChar = vbCrLf) And Not tsInput.AtEndOfStream Then
            'take care of the LF
         '   nextChar = tsInput.Read(1)
          '  tsOutput.Write nextChar
            delimCount = 0
        End If
    End If
        
    

Wend
tsInput.Close
tsOutput.Close

End Function

But thanks again for your help. It set me on the right path!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top