Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CSV to SQL

Status
Not open for further replies.

ArtWerk

Programmer
Jul 31, 2006
66
US
I found somewhere a few lines of code that will do a basic convert from a CSV file into an SQL statement.

I'm having a bit of a problem with it when a particular field has commas inside it. It seems to recognize the commas somewhere in this method, but I don't quite understand how it's doing it. Here's an example of what's happening:

Original value:
Code:
"Spanish, French, Romanian"
New value:
Code:
'\"Spanish', ' French', ' Romanian\"'

here's the code:
Code:
$file_contents_line = @file($csv_file['path'])
	or die ("CSV file not found.");

foreach ($file_contents_line as $key => $val)
{
		// Skip empty lines
	if (empty($val)) continue;
	
	$inserts .= (($key >= 1) ? ", " : "")."('";

	$values = explode($csv_file['sepa'], $val);
	
	for ($j = 0; $j < count($values); $j++)
	{
		$inserts .= addslashes(utf8_decode($values[$j])).(($j != (count($values) - 1)) ? "', '" : "");
	}

	$inserts .= "')";
}

i'm guessing my issue is somewhere on this line:
Code:
$inserts .= addslashes(utf8_decode($values[$j])).(($j != (count($values) - 1)) ? "', '" : "");

What would be nice is if it replaced those commas with semi-colons and got rid of the double quotes, like this:

Original value:
Code:
"Spanish, French, Romanian"
New value:
Code:
'Spanish; French; Romanian;'

or maybe included those commas and just got rid of the double quotes, i don't know, but this line is causing errors in my SQL statement.

Thanks.
 
ok, after much work, i slipped this in the middle:
Code:
	$inserts .= (($key >= 1) ? ", " : "")."('";

	//begin new code
	$bad = "";
	$good = "";
	$val = trim($val);

	if(strstr($val,",\"") || strstr($val,"\",")){
		if(strstr($val,",\"")){
			$starter = strpos($val,",\"");
		} else {
			$starter = "0";
		}
		
		if(strstr($val,"\",")){
			$ender = strpos($val,"\",");
		} else {
			$ender = strlen($val)-3;
		}
		
		$bad = substr($val, $starter, ($ender - $starter));
	
		$bad = ltrim($bad,",");
		if($bad != ""){
			$bad = $bad."\"";
		}
		$bad = trim($bad);
		
		$good = str_replace(",",";",$bad);
		$good = str_replace("\"","",$good);
		
		$val = str_replace($bad,$good,$val);
	}
	//end new code
	
	$values = explode($csv_file['sepa'], $val);
This all seems to work. If someone doesn't mind "proof-reading" this to prevent future problems and possible screw ups in other circumstances. I'm not sure if you could be of much help without looking at the data, but any comments are helpful. Thank you.
 
i think you may be overcomplicating things.

could you provide a data snapshot of a couple of lines of output from your csv and a db schema of the table you want to push into? i've posted a solution to this a few times and it should be easy to show you how to adapt it to your scenario.

 
well, the data is probably not consistent. The only consistent part of the data is that it should be XLS files converted to CSV from Microsoft Excel. Which brings us to the database, it's also generated based on this spreadsheet.

Basically, what i'm doing is creating a table (temp) and the fields are based off the header row in the uploaded CSV. So if the CSV has 8 columns, my table has 8 columns. I create all the fields with varchar(255) NOT NULL to pretty much allow for anything to be in the fields.

So far, i've run into an issue of a random comma in front of a value. I took care of that. I also used Google spreadsheets to convert XLS to CSV and examined the structure and factored that in so it works. This is the latest issue i've had with fields with multiple values in it (like the Spanish, French, German). Now that's working.

So basically, i've been tackling issues as they've come up.

But like i said, i want this script to basically take any CSV file you throw at it and convert it to a MySQL table.

BTW, i'm adding a auto_increment field to the front of each row as well as the column header, just in case i need to manipulate this data while it's in the holding table.

after i finish with the temp table, it's dropped like a bad habit. everything else is pretty smooth sailing. This seems to be the hardest part of my project.
 
once i get this working perfectly (or at least as close as possible), I'd like slap an XLS to CSV converter in front of all this and allow XLS files to be uploaded. That would be ideal. then there wouldn't be the need of them to export their stuff into CSV from Excel.
 
But like i said, i want this script to basically take any CSV file you throw at it and convert it to a MySQL table.

this is very simple. but it sounds like your csv files are not being properly created.

and why don't use just use the xml file natively? why convert to csv and then to a database? you could just use the xml in your application or, if you must, write the xml data directly into your database as a temporary table.
 
I don't have any xml data. just xls and csv.
 
oops. sorry - misread.

so then how is excel garbling the csv conversion?

and does every excel file had a header line that gives the column headings for the temp table?
 
yes. that is one requirement we have that the first row contain the column headers. Because once they upload their file, we get rid of all the junk they don't need by telling them to match their column heads with the ones we need and have in another table.

I don't know what all is occuring when people convert XLS to CSV, but we get the file as a CSV. I'd like to be able to accept XLS files, and if we had our own converter, it would eliminate lots of these little problems. It's all to of course make the user happy and make things easier for them.

Have you ever seen/heard about/written anything that will simply take an XLS and turn it into a CSV? The one's i've found don't really work. I wish Google would put out an API that does what their spreadsheet converter does. That would rock!
 
not xls to csv as a converter.

but if you're running on a windows box then there is no reason why you can't install excel on it and then manipulate the file using COM.

also you could use ODBC to attach to the excel spreadsheet.

or (last chance) if you are able to send me an example of a crummily formatted csv file (of the type you typically receive) i'd be happy to spend 10 minutes or so working out how my standard csv handling code could be adapted to cope with it. upload the file at
 
take a look. I uploaded it in the exact format as they do (spaces in the filename and all)...
 
thanks. very little change was needed to my code in the end, so i used the spare time to provide a tiny bit of user feedback.

the table that is created is permanent. to make it temporary you should add TEMPORARY to the create table definition. but bear in mind that a temporary table disappears when the connection goes (essentially when the browser is refreshed). As I understood that you were then going to present the user with a data mapping screen for a more permanent storage solution in one of your ready made tables, a temporary table may not be what you are after.

However, you would not really need to do this (the temp table). you could, instead, store the csv file and rip the headers from it. present the headers for the data mapping exercise and then write the data as appropriate. Just a thought.

I chose not to use a commit and rollback method of data upload as a user might be ok with 10 or so records out of a couple of thoudand not being validly uploaded.

Also, i chose to make all fields varchar(255) with the exception of the autoincrementing ID field. This will handle all normal fieldtypes just fine (that are uploadable sensibly via a csv) but will not deal with large excel cells with data larger than 255 chars. for this the code changes the data schema on the fly to make the relevant column into a longtext. it's not the most efficient or optimal use of db field types but for a temporary table should result in most data being importable.

here is the code. feel free to ask questions as i have not commented it very thorough.

Code:
<?php
session_name("CSVImport");
session_start();

//create some vars
$log =array();
$baddata = array();

if (!isset($_POST['submit'])):
	displayform();
	exit();
elseif ($_FILES['fileupload']['error'] === 0):
	$filename = $_FILES['fileupload']['tmp_name'];
	if (!file_exists($filename)):
		displayform();
		exit();
	else:
		echo "Please wait while we parse the file<br/>";
		parsefile($filename);
		echo "all done";
		if (count($log) > 0):
			echo "<br/>There were errors in the import process<br/>The errors are as follows and the bad-data has been emailed to the address given in the upload process";
			echo "<pre>";
			print_r($log);
			echo "</pre>";
			send($_POST['emailaddress'], $baddata);
		else:
			echo "<br/>There were no errors in the import process";
		endif;
		echo "<br/>Upload another file?";
		displayform();
	endif;
else:
	displayform();
	exit();
endif;



function parsefile($filename) {
	$dbhost = 		"localhost";
	$dbuname = 		"root";
	$dbpwd = 		"root";
	$dbname =		"test";
	
	//connect to the database
	
	mysql_pconnect($dbhost, $dbuname, $dbpwd) or
		die(mysql_error());
	mysql_select_db($dbname) or
		die (mysql_error());
	
	global $log, $baddata;
	$longtextfields = array();
	//grab the csv file
	$fh = fopen($filename, "rt") or 
		die ("<br/>cannot open $filename");
	$firstrow = TRUE;	
	set_time_limit(0);
	while( ($row = fgetcsv($fh)) !== FALSE):
		if($firstrow):
			//grab the field names
			foreach ($row as $data):
				$keys[] = clean($data);
			endforeach;
			$keys = $row;
			$tableName = "table_".md5(uniqid("", true));
			
			//we may need the table name in a session var
			$_SESSION['tableNames'][] = $tableName;
			
			//create the temp table
			createTable($keys, $tableName);
			
			//turn off the firstrow check
			$firstrow = false;
		else:
			$sql = "
					Insert 
						into $tableName
					set
						`CSVUPLOADID` = NULL,";	
			foreach ($keys as $a=>$b):
				$sql .= "
						`$b` = '".clean($row[$a])."',";
				if (strlen($row[$a]) > 255):
					if (!in_array($b, $longtextfields)):
						longtextify($b, $tableName);
						$longtextfields[] = $b;
					endif;
				endif;
			endforeach;
			//commit the row
			$result = @mysql_query(rtrim($sql,", "));
			$cnt++;
			if ($result === false):
				$baddata[] = $row;
				$log[] = array("error"=>mysql_error(), "sql"=>$sql, "rownum"=>$cnt);
			endif;
		endif;
		
	endwhile;
	fclose($fh);
}
function clean($val){
	if (@mysql_real_escape_string("test") !== false):
		$f = "mysql_escape_string";
	else:
		$f = "mysql_real_escape_string";
	endif;
	return $f(trim($val));
}
function longtextify($col, $tableName){
	$sql ="
		ALTER 
			TABLE `$tableName` 
		MODIFY
			`$col` LONGTEXT NULL
		";
	@mysql_query($sql);
}
function createTable($keys, $tableName){
	$sql = "
				CREATE 
					TABLE 
					IF NOT EXISTS 
				`$tableName`
				(
					`CSVUPLOADID` int(11) NOT NULL auto_increment,
					PRIMARY KEY (`CSVUPLOADID`),
				";
	foreach ($keys as $key):
		$sql .= "
					`$key` varchar(255) NULL,";
	endforeach;
	//get rid of trailing slash and add the engine
	$sql = substr($sql, 0, -1) . ") ENGINE=InnoDB";			
	@mysql_query($sql) or 
		die("unable to create temporary table<br/>Mysql error was: ".mysql_error()."<br/>SQL command was: $sql");
	return true;
}
function displayform() {
echo <<<FORM

<form action="{$_SERVER['PHP_SELF']}" method="post" enctype="multipart/form-data">
<fieldset style="width:30%;">
<legend>Upload csv file</legend>
<input type="file" name="fileupload" /><br/>
Email address: <input type="text" name="emailaddress"/><br/>
<input type="submit" name="submit" value="Upload" />
</fieldset>
</form>

FORM;
}
function send($email, $baddata){
	//put baddata into string
	$textfile = implode("\n", $baddata);	
	
	//encode the text file
	$textfile = chunk_split(base64_encode($textfile),76,"\r\n");
		
	$to = str_replace(array(";", "\n", "\r\n"), "", $email);	//some minor cleansing
	$from = "admin@example.com";
	$subject = "Returned data";
	
	
	//set up the message text
	$defaultmessage = "Your mail client is too old to read html. Get a newer one";
	
	$plaintextmessage = "Your recent attempt to import CSV data had some failures.  The data that we were unable to process automatically is attached to this mail in its original form.";
	
	$htmlmessage = "<div style=\"border: thin solid red; width: 30%; margin: 0 auto; color: green; margin-top: 40px; font-family:'Trebuchet MS', Verdana, Arial, Sans-Serif;\">$plaintextmessage</div>";
	
	//set up the boundaries
	$boundary = md5(uniqid("",true));
	$b1 = "b1---$boundary";
	$b2 = "b2---$boundary";
	
	//separator
	$sep = "\n";
	//set up the mail header
	
	$headers  = "From: $from$sep";
	$headers .= "To: $to$sep";
	$headers .= "Return-Path: $from$sep";
	$headers .= "MIME-Version: 1.0$sep";
	$headers .= "Content-Type: multipart/mixed; boundary=\"$b1\"$sep";
	$headers .= "$sep";    
	
	//now set up the message
	$message = "--$b1$sep";
	$message .= "Content-Type: multipart/alternative; boundary=\"$b2\"$sep";
	$message .= "$sep";    
	
	//default message
	$message .= $defaultmessage.$sep;
	$message .= "$sep";    
	
	//plaintext message
	
	$message .= "--$b2$sep";
	$message .= "Content-Type: text/plain; charset=\"iso-8859-1\"$sep";
	$message .= "Content-Transfer-Encoding: 8bit$sep";
	$message .= "$sep";
	$message .= $plaintextmessage;
	$message .= "$sep";
	
	//html message
	$message .= "--$b2$sep";
	$message .= "Content-Type: text/html; charset=\"iso-8859-1\"$sep";
	$message .= "Content-Transfer-Encoding: 8bit$sep";
	$message .= "$sep";
	$message .= $htmlmessage;
	$message .= "$sep";
	$message .= "--$b2--";
	$message .= "$sep";
	
	
	
	//create the attachment part
	$message .= "--$b1$sep";
	$message .= "Content-Type: application/octet-stream$sep";
	$message .= "Content-Transfer-Encoding: base64$sep";
	$message .= "Content-Disposition: attachment; filename=\"bad_data.csv" . $sep;
	$message .= $sep;    
	$message .= $textfile;
	$message .= $sep;
	//finish the mail
	
	$message .= "--$b1--$sep";
	
	@mail ($to, $subject, $message, $headers);
}
?>
 
I haven't had a chance to implement it yet. I should be able to test it tonight.
 
Sorry, JP. I've been so busy. I haven't been able to touch this in days. I should definitely be able to try this tonight and I'll let you know. Thanks again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top