Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delete Lines from Text File that DO NOT contain... 2

Status
Not open for further replies.

Nu2Java

Technical User
Jun 5, 2012
166
US
Hello, I have a fairly large text file that I need to clean up to remove lines that are not needed. I have played around with a regex script, but I cannot figure how to get to the location of the line that I need to look at.

Here is the current format where I need to delete the lines after the last comma that do not start with letters "R, T, C, U, F, V, H"
Code:
01/29/14,07:56:27,01,0005165           Yes
01/29/14,07:56:35,01,RE99-88920F
01/29/14,07:56:46,03,RE99-12012S
01/29/14,09:05:35,01,0005047      670023
01/29/14,09:05:41,01,TR286-01
01/29/14,09:05:47,03,TR286-01
01/29/14,09:05:54,01,RN35-01
01/29/14,09:06:08,03,RN35-01
01/29/14,09:06:14,01,CR250-08E
01/29/14,09:06:25,03,CR250-08E
01/29/14,09:06:28,01,CY12-18
01/29/14,09:06:42,03,CY12-18
01/29/14,09:06:52,01,287474      PO#

IF this is something that Regex should do, I would like to learn more about it. Thanks for any help!
 
Oh, I'm certain it is. But I'm not sure if it can return the line number (nor would I be able to tell you how to do it.) However, combine regex and vbs and you've got it made

1. loop through the text file one line at a time
2. increment a counter
3. split() the line to pieces by comma.
4. test the 4th piece with regex

Code:
set objFSO = CreateObject("Scripting.FileSystemObject")
set objFile = objFSO.OpenTextFile("c:\development\vbs\test.log", 1, true, 0)

set objRegEx = new RegExp
objRegEx.pattern = "R|T|C|U|F|V|H"

do while not objFile.AtEndOfStream
	intLineNum = intLineNum + 1
	strLine = objFile.ReadLine
	'msgbox strLine
	arrPieces = split(strLine, ",")
	if objRegEx.test(arrPieces(3)) then
		strFound = strFound & "Found on line " & intLineNum & vbNewLine
	end if
loop

wscript.echo strFound

-Geates

 
Thanks Geates... This is great! How do I deal with empty lines? I get an error on this line:
Code:
if objRegEx.test(arrPieces(3)) then

It appears to be when I have empty lines in the file.
 
check the length of the line. no need to split the line and do the regex if the line is empty

Code:
do while not objFile.AtEndOfStream
	intLineNum = intLineNum + 1
	strLine = objFile.ReadLine
	[COLOR=#CC0000]if (len(strLine)) then[/color]
		arrPieces = split(strLine, ",")
		if objRegEx.test(arrPieces(3)) then
			strFound = strFound & "Found on line " & intLineNum & vbNewLine
		end if
	[COLOR=#CC0000]else
		'the line is blank
	end if[/color]
loop

-Geates

 
Excellent.. Thank You, Geates! This works great. As always, I appreciate your time and help.
 
Do you want to "get to the location of the line that I need to look at" or simply "delete the lines after the last comma that do not start with letters "R, T, C, U, F, V, H"? If the latter, then the following code does the trick:

Code:
[blue]Dim InputFile, OutputFile
    
With CreateObject("Scripting.FileSystemObject")
    Set InputFile = .OpenTextFile("c:\downloads\input.txt", 1)
    Set OutputFile = .OpenTextFile("c:\downloads\output.txt", 2, True)
End With
    
With CreateObject("VBscript.regexp")
    .Pattern = "(.+,){3}[RTCUFVH].*\r\n"
    .Global = True
    OutputFile.Write .Replace(InputFile.ReadAll, "")
End With[/blue]



 
Strongm.. thanks for the reply. To answer your question, yes, I do only want to delete those lines and write to a new file. I did try your code, but it does leave other lines that are not within the regex pattern.

Geates.. I am also getting other lines left that are not in the pattern as well. Below is the code I am using right now. Maybe I am leaving something out or have something in the wrong place. So far, I am seeing lines leftover that have "J" & "B" as the starting character after 3rd comma. I am using a HTA window to browse for the file to format.

Code:
Const ForReading = 1
Const ForWriting = 2
Const CreateIfNeeded = True

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(file.value, 1, True, 0)

strFile = File.Value & "--Formatted.txt"

Set objRegEx = New RegExp
objRegEx.pattern = "R|T|C|U|F|V|H"

Do While Not objFile.AtEndOfStream
	intLineNum = intLineNum + 1
	strLine = objFile.ReadLine
If (len(trim(strLine))) Then
	'msgbox strLine
	arrPieces = split(strLine, ",")
	If objRegEx.test(arrPieces(3)) Then
		strFound = strFound & strLine & vbNewLine
	End if
Else
'Do nothin
End If
Loop

objFile.Close
Set objFileB = objFSO.OpenTextFile(strFile,ForWriting,True)
objFileB.Write strFound 
objFile.Close

MsgBox "Formatting Complete..." & vbCr & vbCr & "File Location: " & strFile, vbInformation
Window.Close()
End Sub
 
Whoops, cut'n'paste error. This is the correct version:

Code:
[blue]Dim InputFile, OutputFile

With CreateObject("Scripting.FileSystemObject")
    Set InputFile = .OpenTextFile("c:\downloads\input.txt", 1)
    Set OutputFile = .OpenTextFile("c:\downloads\output.txt", 2, True)
End With

With CreateObject("VBscript.regexp")
    .Pattern = "(.+,){3}[^RTCUFVH].*\r\n"
    .Global = True
    OutputFile.Write .Replace(InputFile.ReadAll, "")
End With[/blue]
 
Thanks strongm... this does work. How can I deal with removing blank lines in this code?
 
Code:
[blue]Dim InputFile, OutputFile

With CreateObject("Scripting.FileSystemObject")
    Set InputFile = .OpenTextFile("f:\file downloads\input.txt", 1)
    Set OutputFile = .OpenTextFile("f:\file downloads\output.txt", 2, True)
End With

With CreateObject("VBscript.regexp")
    .Pattern = "^(((.+,){3}[^RTCUFVH].*)|(^\s*))$" '(\r\n)"
    .Global = True
    .MultiLine = True
    OutputFile.Write .Replace(InputFile.ReadAll, "")
End With[/blue]
 
Thanks strongm... this works nicely. I did have to uncomment the last portion as it was leaving blanks where it removed lines.
Code:
[highlight #FFFFFF]'(\r\n)"
[/highlight]
 
The .Multiline should have dealt with that. Are you sure you copied all the code, and not just the pattern?
 
strongm ... yes, I just tested again and it seems that it removes the blanks, but it leaves all the lines blank where it removed lines.
 
What is your end goal? It sounds like you want ONLY lines that begin with a letter after the 3rd comma. To use your original input file.
[tt]
01/29/14,07:56:27,01,0005165 Yes
01/29/14,07:56:35,01,RE99-88920F
01/29/14,07:56:46,03,RE99-12012S
01/29/14,09:05:35,01,0005047 670023
01/29/14,09:05:41,01,TR286-01
01/29/14,09:05:47,03,TR286-01
01/29/14,09:05:54,01,RN35-01
01/29/14,09:06:08,03,BN35-01
01/29/14,09:06:14,01,CR250-08E
01/29/14,09:06:25,03,CR250-08E
01/29/14,09:06:28,01,CY12-18
01/29/14,09:06:42,03,JY12-18
01/29/14,09:06:52,01,287474 PO#
[/tt]
becomes
[tt]
01/29/14,07:56:35,01,RE99-88920F
01/29/14,07:56:46,03,RE99-12012S
01/29/14,09:05:41,01,TR286-01
01/29/14,09:05:47,03,TR286-01
01/29/14,09:05:54,01,RN35-01
01/29/14,09:06:08,03,BN35-01
01/29/14,09:06:14,01,CR250-08E
01/29/14,09:06:25,03,CR250-08E
01/29/14,09:06:28,01,CY12-18
01/29/14,09:06:42,03,JY12-18
[/tt]
correct or incorrect?

-Geates

 
Geates, Yes I want to remove lines that only have specific letters after the 3rd comma that I know are valid part numbers. I do have approx 3 numbers that are legal and will need to stay. These entries come from barcode scans, and there are certain barcodes that get scanned by accident which makes a mess of the text file. Your example is correct.
 
ok, so based on the code you posted last, this will write ONLY those lines that DO NOT begin with certain letters:

Code:
set objFSO = CreateObject("Scripting.FileSystemObject")
[COLOR=#CC0000]strInFile = file.value[/color]
[COLOR=#CC0000]strOutFile = strInFile & "--Formatted.txt"[/color]
[COLOR=#CC0000]Set objInFile = objFSO.OpenTextFile(strInFile, 1, True, 0)[/color]
[COLOR=#CC0000]Set objOutFile = objFSO.OpenTextFile(strOutFile, 2, True, 0)[/color]

Set objRegEx = New RegExp
objRegEx.pattern = "R|T|C|U|F|V|H"

Do While Not objInFile.AtEndOfStream
	intLineNum = intLineNum + 1
	strLine = objInFile.ReadLine
	If (len(trim(strLine))) Then
		arrPieces = split(strLine, ",")
		If [COLOR=#CC0000]not[/color] objRegEx.test(arrPieces(3)) Then
[COLOR=#CC0000]			objOutFile.WriteLine strLine[/color]
		End if
	End If
Loop

objInFile.Close
objOutFile.Close

MsgBox "Formatting Complete..." & vbCr & vbCr & "File Location: " & strOutFile, vbInformation
Window.Close()

-Geates

 
Thanks Geates ... I need to do the reverse. I want to only write the lines to a new file for what is listed in the pattern.
 
I should have started with... I tried the code and it writes other lines from what is in the pattern.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top