Hi!
I use a tool called mailfilter on Mac OS X to filter spam mail whilst it is still on the POP3 server.
However my service provider also supports a "reject list" so that mail from selected domains is rejected before you see it in your POP3 mailbox (useful, as I use WebMail, and would like to see an uncluttered mailbox).
When mailfilter executes, it generates a log file, in amongst which is the sending address. I want to extract that address, strip off everything before the @ symbol, retaining just the domain name which then goes into a text file that I can review and add selectively to the Reject List.
Here's an example of such a record;
mailfilter: Deleted "Hazel Numbers" <nqactl89a@prplacements.biz>: RE:Stop maintenance fees, Tue, 25 Nov 03 01:15:04 GMT. [Applied filter: '^Content-Type:\ multipart/alternative;' to 'Content-Type: multipart/alternative; boundary="0F._3B..B16FC4CB1A_7D2B8"']
The bit I want to extract is the ""prplacements.biz" domain.
Here's what I have at present, which on the above string, does the trick (a bit kacky, I know);
_______________________________________________________
#!/bin/bash
echo "Starting mailfilter..."
mailfilter
# After mailfilter runs, extract the domain from the addresses we don't want.
LOG=/Users/<my account>/Library/Logs/mailfilter.log
cat $LOG | grep Deleted | awk '{ print $5 }' | grep \@ | cut -d\@ -f2 | sed -e 's/>://g' | sort -u >>/Users/<my account>/Desktop/domains_to_be_removed.txt
# Wipe the log file for the next iteration
cat /dev/null>$LOG
___________________________________________________________
Couldn't be simpler (I thought). But the command depends on the position of the email address, determined by the "print $5" awk routine. Although the above routine works, it doesn't always get all of the addresses, as sometimes there is no recipient name before the address, so instead of a string in between the quotes (i.e. "Hazel Numbers"
there is just "", such as in the example below;
mailfilter: Deleted "" <lynda.odonnell@erie.net>: Get what you always wanted miilqt ivzyhys, Sun, 17 Mar 02 17:38:11 GMT. [Applied filter: '^Content-Type:\ multipart/alternative;' to 'Content-Type: multipart/alternative; boundary="7A...8_8AC8"']
So the query is; anyone know of a foolproof means to extract the domain from the email address in the strings, without the positional dependence I have at present, even if it means abandoning what I have?
Thanks in anticipation
recl
I use a tool called mailfilter on Mac OS X to filter spam mail whilst it is still on the POP3 server.
However my service provider also supports a "reject list" so that mail from selected domains is rejected before you see it in your POP3 mailbox (useful, as I use WebMail, and would like to see an uncluttered mailbox).
When mailfilter executes, it generates a log file, in amongst which is the sending address. I want to extract that address, strip off everything before the @ symbol, retaining just the domain name which then goes into a text file that I can review and add selectively to the Reject List.
Here's an example of such a record;
mailfilter: Deleted "Hazel Numbers" <nqactl89a@prplacements.biz>: RE:Stop maintenance fees, Tue, 25 Nov 03 01:15:04 GMT. [Applied filter: '^Content-Type:\ multipart/alternative;' to 'Content-Type: multipart/alternative; boundary="0F._3B..B16FC4CB1A_7D2B8"']
The bit I want to extract is the ""prplacements.biz" domain.
Here's what I have at present, which on the above string, does the trick (a bit kacky, I know);
_______________________________________________________
#!/bin/bash
echo "Starting mailfilter..."
mailfilter
# After mailfilter runs, extract the domain from the addresses we don't want.
LOG=/Users/<my account>/Library/Logs/mailfilter.log
cat $LOG | grep Deleted | awk '{ print $5 }' | grep \@ | cut -d\@ -f2 | sed -e 's/>://g' | sort -u >>/Users/<my account>/Desktop/domains_to_be_removed.txt
# Wipe the log file for the next iteration
cat /dev/null>$LOG
___________________________________________________________
Couldn't be simpler (I thought). But the command depends on the position of the email address, determined by the "print $5" awk routine. Although the above routine works, it doesn't always get all of the addresses, as sometimes there is no recipient name before the address, so instead of a string in between the quotes (i.e. "Hazel Numbers"
mailfilter: Deleted "" <lynda.odonnell@erie.net>: Get what you always wanted miilqt ivzyhys, Sun, 17 Mar 02 17:38:11 GMT. [Applied filter: '^Content-Type:\ multipart/alternative;' to 'Content-Type: multipart/alternative; boundary="7A...8_8AC8"']
So the query is; anyone know of a foolproof means to extract the domain from the email address in the strings, without the positional dependence I have at present, even if it means abandoning what I have?
Thanks in anticipation
recl