Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Concatenating multiple text files

Status
Not open for further replies.

eAlchemist

Technical User
Aug 28, 2003
64
0
0
US

I have a problem that I've been wrestling with... I get a number of large text files (variable number of files ranging in size from 30-50MB) from clients that I need to concatenate together. They're all in the same format, but they all have column headers in them, so I can't simply copy the whole file. I tried using TextStream to copy the file line by line, but it's too slow with this many large files.

Can anyone recommend a method that might be relatively easy to write, but that will be fast?

Thanks,
Chris
 
OK, so how do I strip out the header? I know how to re-write the file line by line, but how do I just delete a line in the file?

Thanks,
Chris
 
A starting point:
strBuffer = FileSystemObject.ReadAll
strBuffer = Mid(strbuffer, 1 + InStr(strBuffer, vbLf))

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
I was feeling old-school this afternoon and played around a bit with DOS batch files and EDLIN with a script file.

I think it could be done very economically that way. Edlin might puke on a 50 meg file though.
 
eAlchemist,
Try working with the files in Binary. Here is a sample routine, I used it to combine 3 79,689 KB files into one 239,065 KB file (less two headers) and it took my machine 9 seconds.
Code:
Sub BinaryTest()
Const o As String = "C:\Output.txt"
Dim b() As Byte
Dim d As Date
Dim l As Long
Dim i As Integer, j As Integer, k As Integer
Dim f(2) As String
f(0) = "C:\Source1.txt"
f(1) = "C:\Source2.txt"
f(2) = "C:\Source3.txt"

d = Now

j = FreeFile
Open o For Binary As #j

i = FreeFile
For k = 0 To 2
  Open f(k) For Binary As #i
  'This loop will find the first line break by looking 2 bytes
  'at a time
  ReDim b(1)
  Do
    l = l + 1
    Get #i, l, b
  Loop Until StrConv(b, vbUnicode) = vbCrLf
  'Found the end of the first line, grab the remainder of the file
  ReDim b(0 To LOF(i) - l)
  Get #i, , b
  Close #i
  'Write to the output file
  Put #j, , b
  l = 0
Next k
Reset
Debug.Print Format$(Now - d, "h:mm:ss")
End Sub

Hope this helps,
CMP

[small]For the best results do what I'm thinking, not what I'm saying.[/small]
(GMT-07:00) Mountain Time (US & Canada)
 
the fastest way to do this is using database (ADO) methods with a query as skip discussed. the sql string could be made to skip the headers like this:
Code:
SELECT * FROM myfile WHERE eachline NOT LIKE (" & _
          CHR(34) & "%part of the header line%" & CHR(34) & ")"
this will exclude the header and get all the other records. After you get all the records in a recordset you can use write method to write them to a newfile. if the name/number of input files are not known, you can use common dialog file select method (like windows explorer) and select any file you want.
you can loop through these multiple files and select records using ADO query as above, then write them to a new file.
 
jfdabiri,
Wouldn't it be easier to skip the header with
[tt]SELECT * FROM SomeFile.txt IN 'C:SomeFolder' 'Text;FMT=Delimited;HDR=YES;IMEX=2;';[/tt]
?

And a note about this method, if you need to 'process' the file, I agree this is the fastest compared to other methods I've seen. But to join several files together why add the overhead of a recordset object to a simple i/o operation?

I would stick with the [tt]ReadAll[/tt] method recommended by PHV or the binary operations from my post (I'm biased towards the binary approach).

CMP


[small]For the best results do what I'm thinking, not what I'm saying.[/small]
(GMT-07:00) Mountain Time (US & Canada)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top