Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Andrzejek on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delete Lines from Text File that DO NOT contain... 2

Status
Not open for further replies.

Nu2Java

Technical User
Jun 5, 2012
166
US
Hello, I have a fairly large text file that I need to clean up to remove lines that are not needed. I have played around with a regex script, but I cannot figure how to get to the location of the line that I need to look at.

Here is the current format where I need to delete the lines after the last comma that do not start with letters "R, T, C, U, F, V, H"
Code:
01/29/14,07:56:27,01,0005165           Yes
01/29/14,07:56:35,01,RE99-88920F
01/29/14,07:56:46,03,RE99-12012S
01/29/14,09:05:35,01,0005047      670023
01/29/14,09:05:41,01,TR286-01
01/29/14,09:05:47,03,TR286-01
01/29/14,09:05:54,01,RN35-01
01/29/14,09:06:08,03,RN35-01
01/29/14,09:06:14,01,CR250-08E
01/29/14,09:06:25,03,CR250-08E
01/29/14,09:06:28,01,CY12-18
01/29/14,09:06:42,03,CY12-18
01/29/14,09:06:52,01,287474      PO#

IF this is something that Regex should do, I would like to learn more about it. Thanks for any help!
 
I tried that, but it still writes other lines that start with characters not in the pattern.
 
>objRegEx.pattern = "^[R|T|C|U|F|V|H]"

I've already provided the correct pattern for this ... :)

objRegEx.pattern = "[^RTCUFVH]"

And here's a final variation on my code, that should nopw work properly:

Code:
[blue]Dim InputFile, OutputFile

With CreateObject("Scripting.FileSystemObject")
    Set InputFile = .OpenTextFile("f:\file downloads\input.txt", 1)
    Set OutputFile = .OpenTextFile("f:\file downloads\output.txt", 2, True)
End With

With CreateObject("VBscript.regexp")
    .Pattern = "(.+,){3}[^RTCUFVH].*?(\r\n|$)|(\r\n)(\r\n)"
    .Global = True
    OutputFile.Write .Replace(InputFile.ReadAll, "$4")
End With[/blue]
 
Thank You Geates & Strongm, you guys are the best! The code works great now and does exactly what I need. This has been a great learning experience for me.
 
Geates and strongm... I've been following this post with interest. Stars for you both for knowledge and perseverance.
 
For a good regex reference, I suggest Mastering Regular Expressions by Jeffrey Friedl. It explains not only the syntax of regex, but also explains how to write efficient regex based on how the underlying "engine" works.
 
jges

Thanks for the suggestion on the regex, I am looking into that now as regex can help me with a lot of tasks.
 
If I want to expand on this code, how would I modify this pattern if I want to write lines that instead of starting with just "A", I want it to be starting with "AR" & "BR" instead of "B".

Code:
.Pattern = "(.+,){3}[^ABCDEFGHIJLMPRSTUV459].*?(\r\n|$)|(\r\n)(\r\n)"
 
Thanks Geates... Is there another edit for the write to new file? I need to reverse what it is writing to a new file, now it writes everything EXCEPT for the AR|BR lines.

Code:
OutputFile.Write .Replace(InputFile.ReadAll, "$4")
 
It shouldn't do. I think you may have a coincidental match; that expression will discard anything that starts with AA, AB, BA, AR, RA, BR, RB, RR, and BB (rather than just AR or BR). In addition, the introduction of an additional bracketed term should mean the return of empty lines ...

 
I just tried the code this morning and that is what happened. I have been searching for other examples, but without any luck. The most recent code works GREAT, but this will be a nice addition since I have found on occasion items that start with A also have a second character that I would like to ignore and not have written to a new file.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top