Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Deleting an arbitrary number of lines from a large text file 1

Status
Not open for further replies.

Don Child

Programmer
Apr 4, 2003
62
2
8
US
Hi,

I'm trying to figure out how to efficiently remove hudreds of thousands of lines from a large text file. So e.g., one of the text files has about 3,082,488 lines, and is close a gig in size.

So, for testing purposes, we want to pair down several of these files, and maybe remove about 75% of the bottom lines. Or the top lines ; it doesn't matter which lines get removed.

This differs from most of the questions on the web, about removing lines from text files. Most of those questions involve filtering out text files by the existence of a specific string. These lines can be removed, regardless of their content.

It doesn't actually have to be done, in VBScript. It could be a Windows batch file. But not PowerShell.

There's some sample VBScript code here, from the people who publish my VBScript IDE:


I haven't tried this yet, but it seems to open the entire file in memory. I'm not certain how much that will slow down the system, or use up whatever virtual memory we have available.

Does this script look ok? Or is there a better VBScript or Windows Shell method of shortening a text file?
 



Code:
Option Explicit

Const FOR_READING = 1
Const FOR_WRITING = 2


' Delete First n Lines of a Text File
Dim FileName_Input_s
Dim FileName_Output_s
Dim Line_s

FileName_Input_s = "c:\temp\BigAssFile.txt"
FileName_Output_s = "c:\temp\SmallerAssFile.txt"

Dim Nth_Line_n
Dim Number_of_Lines_to_Delete_n

Dim objFS_inp
Dim objTS_inp
Dim objFS_out
Dim objTS_out


Set objFS_inp = CreateObject("Scripting.FileSystemObject")
Set objTS_inp = objFS_inp.OpenTextFile(FileName_Input_s, FOR_READING)
Set objFS_out = CreateObject("Scripting.FileSystemObject")
Set objTS_out = objFS_out.CreateTextFile(FileName_Output_s, True)


Number_of_Lines_to_Delete_n = 10000

For Nth_Line_n = 1 To Number_of_Lines_to_Delete_n
    Line_s = objTS_inp.ReadLine
    objTS_out.WriteLine Line_s 
Next

objTS_inp.Close
objTS_out.Close
 
>Number_of_Lines_to_Delete_n

Only you are not deleting them. In your code this is the number of lines you are retaining. Which is fine given it achieves your objective. Just might be better to ensure variable name actually reflects what it is/does.
 
That's correct, Strong. We're not really touching the original file, at first, I should have said that more clearly.

We're in effect removing the lines by copying them to a new file, then deleting the original, and renaming the output to the original.

We only needed to do this one time, for about a dozen large files that we're using for testing.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top