Hi everyone,
I am trying to cut away the *first* header information from a forwarded email file, then remove the ">" symbols from the beginning of each line in the remainder of the file to get back to the original email the person forwarded to me. This is all to feed into a bayesian SPAM filter learning process. My users will forward any SPAM that makes it through our SPAM filter to me and I want to save the email, minus my users' headers, to a file. I thought of just grepping out all lines beginning with ">", but I've found that some lines of the forwarded HTML email do not begin with ">". Is there a way to search for the first line beginning with ">", then print that and all remaining lines using awk?
thank you for any help,
Scott Wallace
I am trying to cut away the *first* header information from a forwarded email file, then remove the ">" symbols from the beginning of each line in the remainder of the file to get back to the original email the person forwarded to me. This is all to feed into a bayesian SPAM filter learning process. My users will forward any SPAM that makes it through our SPAM filter to me and I want to save the email, minus my users' headers, to a file. I thought of just grepping out all lines beginning with ">", but I've found that some lines of the forwarded HTML email do not begin with ">". Is there a way to search for the first line beginning with ">", then print that and all remaining lines using awk?
thank you for any help,
Scott Wallace