Hi all,
For some reason our organisation decided to issue a 16 item Questionnaire to our members via an email rather than secure form. Now I have the task of collating the results which is proving to be a pain!
Basically, we sent a couple of paragraphs of text followed by the 16 Questions and then a request for answers (a copy can be found here >
As you can imagine, when these come back a number of people have quoted the original mail putting their answers at the end, some have put theirs at the start and others have not quoted the original at all.
Some have given simple answers such as
Q1 c
While others have said
Q1 Yes
or
Q1 Yes however it depends on the level of support.
Basically there are many many different ways the replies have come in so I decided to read all of the emails into a database which was straightforward however I now have over a thousand entries in a database consisting of the following fields
From
Subject
Body
The Body containing the entire text from the email.
Well, thats the background!
What Im trying to do is make the collation of the results a bit easier so to start I want to remove as much junk from the emails as possible leaving (ideally) only the answers bit.
Trouble is theres so many different problems with the emails, so to start with does anyone have any regular expressions which are helpful in removing the assorted garbage that emails have in?
Im a newbie to regular expressions and did a few to try and extrapolate the Answers but they only work so far.
I know Im probably looking for a needle in a haystack but anything is worth a go!
Cheers
Peter
For some reason our organisation decided to issue a 16 item Questionnaire to our members via an email rather than secure form. Now I have the task of collating the results which is proving to be a pain!
Basically, we sent a couple of paragraphs of text followed by the 16 Questions and then a request for answers (a copy can be found here >
As you can imagine, when these come back a number of people have quoted the original mail putting their answers at the end, some have put theirs at the start and others have not quoted the original at all.
Some have given simple answers such as
Q1 c
While others have said
Q1 Yes
or
Q1 Yes however it depends on the level of support.
Basically there are many many different ways the replies have come in so I decided to read all of the emails into a database which was straightforward however I now have over a thousand entries in a database consisting of the following fields
From
Subject
Body
The Body containing the entire text from the email.
Well, thats the background!
What Im trying to do is make the collation of the results a bit easier so to start I want to remove as much junk from the emails as possible leaving (ideally) only the answers bit.
Trouble is theres so many different problems with the emails, so to start with does anyone have any regular expressions which are helpful in removing the assorted garbage that emails have in?
Im a newbie to regular expressions and did a few to try and extrapolate the Answers but they only work so far.
I know Im probably looking for a needle in a haystack but anything is worth a go!
Cheers
Peter