Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

clearing rubbish out of database emails

Status
Not open for further replies.

pease

Programmer
Apr 2, 2009
29
GB
Hi all,

For some reason our organisation decided to issue a 16 item Questionnaire to our members via an email rather than secure form. Now I have the task of collating the results which is proving to be a pain!

Basically, we sent a couple of paragraphs of text followed by the 16 Questions and then a request for answers (a copy can be found here >
As you can imagine, when these come back a number of people have quoted the original mail putting their answers at the end, some have put theirs at the start and others have not quoted the original at all.

Some have given simple answers such as
Q1 c

While others have said
Q1 Yes

or

Q1 Yes however it depends on the level of support.


Basically there are many many different ways the replies have come in so I decided to read all of the emails into a database which was straightforward however I now have over a thousand entries in a database consisting of the following fields

From
Subject
Body

The Body containing the entire text from the email.


Well, thats the background!

What Im trying to do is make the collation of the results a bit easier so to start I want to remove as much junk from the emails as possible leaving (ideally) only the answers bit.

Trouble is theres so many different problems with the emails, so to start with does anyone have any regular expressions which are helpful in removing the assorted garbage that emails have in?

Im a newbie to regular expressions and did a few to try and extrapolate the Answers but they only work so far.

I know Im probably looking for a needle in a haystack but anything is worth a go!

Cheers

Peter
 
Although not reg expression based (as I don't know how to do them !) I'd start loading the emails into a DB as one row per line, with a key of email_name + identity_column (where email_name might be the from field. I assume you need to tie the response to the sender ?.
Then I'd delete all the text that started with the stuff you don't want e.g. starts with Q1), or Dear etc. You might just have to use your imagination. Looking at your example it looks like just keep the lines starting Q1) etc, though I suspect that's just a simple one and people might have embedded answers rather than do them at the end.
You'll get rid of a load of the rubbish by deleting from the DB but might stil have a bit of work to do, as there are only 1000 you might be better off editing manualy.

Finally I'd like you to onow as a Sunderland supporter )born and bred) how difficult it was to offer any advice.
 
so you're after a database table that has one column for email, 16 columns for question response and one row for each response?

and you currently have a table with the body of each email in a single column?

if i have got you right, that should be easy enough. a dozen lines of code perhaps.
 
I'd take it higher. This is not the job of a programmer. As a student, I have worked in company that handled these forms (on paper, and they were scanned electronically). This is not a straightforward job. If you handle this yourself, it will be you who decides what is accepted as an answer and what is not. If you are asked to build this as a web form in PHP, these decisions are taken by the one who requests that form. You can then advice, but it is not your decision.

What do you do with the comments? Ignore them? For any questionnaire, they are the most important and they do almost never fit in. The comments can make a "yes" to a "no" and the other way around.

Get back to your organisation and show them the difficulties and the decisions you will have to make if you do this yourself.


+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
here's my starter for ten. no testing done on this

Code:
<?php
// open up a recordset for the existing table

$result = mysql_query("select emailAddress, emailBody from emailTable");
$error = array();
while ($row = mysql_result($result)){
	//extract answer block
	$pattern = '/ANSWERS(.*?)Thank You/msx';
	preg_match($pattern, $row['emailBody'], $match);
	$answers &= $match[1];
	
	//convert all line breaks to simple line breaks
	$answers = str_replace("\r\n", "\n", $answers);
	
	//search for breaks
	$_answers = explode("\nQ", $answers);
	//remove the white space
	$_answers = array_map('trim', $_answers);
	
	foreach ($_answers as $a){
		//first two characters should be question number
		$question = substr($a, 1, 1);
		if (is_numeric($question)){
			$data[$question] = trim(substr($a,3));
		} else {
			$errors[] = array('cannot identify the question number', $a);
		}
	}
	//cleanse the data array
	for ($i=1; $i<=16; $i++){
		if (!isset($data[$i]))
		$data[$i] = '';
	}
	$sql = 'insert into newEmailTable (id, emailAddress, q1, q2, q3, q4, q5, q6, q7, q8,q9, q10, q11, q12, q13, q14, q15, q16)
			values
			(NULL,
			%s, %s, %s, 
			%s, %s, %s, 
			%s, %s, %s, 
			%s, %s, %s, 
			%s, %s, %s, 
			%s, %s)';
	$_data = array_merge(array($row['emailAddress']), $data);
	$_data = array_map('dbReady', $_data);
	$query = vpsrintf($sql, $_data);
	$result = @mysql_query($query);
	if (!$result){
		$errors[] = array('database error', 'error was: '. mysql_error() . "<br/>Query was: ". $query);
	}
}
if (count($errors) > 0){
	echo "<pre>" . print_r($errors, true) . "</pre>";
}
?>
 
and the dbready function

Code:
/**
 * helper function to enquote and return escaped data for mysql usage
 * 
 * @param object $data
 * @return 
 */
function dbReady($data){
	//trim the data
	$data = trim($data);
	if (function_exists('mysql_real_escape_string')){
		$result = @mysql_real_escape_string($data);
		if (!$result){
			$result = mysql_escape_string($data);
		}
	} else {
		$result = mysql_escape_string($data);
	}
	return "'". $result. "'";	
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top