Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Read/skip lines in file until pattern found, then print remainder 3

Status
Not open for further replies.

Ternion

MIS
Aug 13, 2003
16
US
Hi everyone,
I am trying to cut away the *first* header information from a forwarded email file, then remove the ">" symbols from the beginning of each line in the remainder of the file to get back to the original email the person forwarded to me. This is all to feed into a bayesian SPAM filter learning process. My users will forward any SPAM that makes it through our SPAM filter to me and I want to save the email, minus my users' headers, to a file. I thought of just grepping out all lines beginning with ">", but I've found that some lines of the forwarded HTML email do not begin with ">". Is there a way to search for the first line beginning with ">", then print that and all remaining lines using awk?

thank you for any help,
Scott Wallace
 
nawk '/^>/ , 0' myFile.txt

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi vlad,
Unfortunately, some lines in the forwarded section don't start with ">", so it would be nice to read in lines at a time and discard them until it reaches the first line beginning with ">", then print out that and all remaining lines including any more that do not start with ">".
Thank you for the suggestion, though.

thanks,
Scott
 
or with sed:
sed -n '/^>/,$p' myFile.txt

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
To remove the > from the printed lines, add a sub and a print to the awk program

/^>/,0{sub(/^>/,"");print}

(Vlad, nice method of selecting to the end of the file)

CaKiwi
 
My mistake, I'm sorry. I didn't realize that your suggestion did, indeed, print everything from the first "^>" as I need. My apologies and thank you very much for your help!

Thank you, CaKiwi, as well for the addition to remove the "^>"'s
Everyone here is so helpful; I really appreciate it!

thanks,
Scott
 
One last thing on this:
Is there a way to add a DOS return "^M" to the end of each line? When I open the cleaned-up file with notepad, its all run together and I'm afraid that the spam filter program may not deal with that correctly.

thanks,
Scott
 
Change the print to

print $0 "\r"

if that doesn't work try,

print $0 "\015"

CaKiwi
 
Thanks CaKiwi! The "\r" works great.

thanks again, Scott

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top