Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing Odd CSV File

Status
Not open for further replies.

PCHomepage

Programmer
Feb 24, 2009
609
US
Generally, parsing and inserting a CSV file into MySQL (I'm using object oriented mysqli) is fairly straightforward but I have some new files with a very odd format that need parsing and I have no clue how to do it because it's in sections, each require inserting into (or updating an existing) entry. The first line contains the file name which can be ignored and is always prefixed with the word "product name" or something like that. Sorry, I don't have the file here so I'm paraphrasing.

Then there is a series of names and values as a pair of columns like:

Name1, value1
Name2, value2
Name3, value3
etc.

which continues for a way in similar fashion followed by a space (empty line), then another similar one followed by another space. These two sections always seem to have two columns with the title in the left column and the data values next to it in the right column. As I understand it, these will always have the same number of columns so I can create a table that has the column names of the left column values, then it needs to insert the right column values appropriately. If the particular file has already been uploaded, this same data would need to be replaced, not inserted, and the file name is stored in another table.

Then the rest of the data is in regular header, row and column but I need the data after the header row, not the header itself although the first header value can be used to determine where the values begin. Although the number of values in each section may vary, each one seems to begin with something specific that should help discern where one section ends and the next begins if only I knew how to do it! Any ideas?
 
you are only supposed to pass a section to jFlip. note the example above that passes only section 1 and not the whole data array.
 
Thank you! I realized my error as soon as I posted. Sorry to seem so ignorant but I've been away from programming for years and have forgotten most of what I knew.

That said, I'm back to this issue now after having several wrenches thrown into the works. It turns out that the CSV files will come in several different formats including one that is totally different (first column in the first two rows will not be inserted anywhere). There may be other variants too so it would be nice to be able to submit some kind of a template array into the function so that it knows what to do in order to make it as versatile as possible!

I suspect I'm making this more complicated than it is so any additional help would be greatly appreciated. I understand the concepts though not the details but maybe something like this that matches the database table:

Code:
$InsertTable = array('tablename', 'col1', 'col2', 'col3', 'col4');

and a data set up array something like that would indicate what to look for in the CSV to match it to the database:

Code:
//Columns to skip would have 0
//Hasheaders would tell it that there are headers and to look there for the values but to skip them for the inserting
$InsertParams = array('lookfor', 'endwith', 'hasheaders', 'col1', 0, 0', 'col4');

In other words, something that would tell it to take the first two columns but insert them into the first and fourth columns of the database. This is just a stab in the dark as I'm not an expert on arrays although I know databases reasonably well. I just don't get how to put it all together!

Maybe another array to indicate how many sections, on what they begin, which to flip or not and what to actually use:

Code:
// Pairs: Section name, flip, has headers
$ParseSections = array('A Param',1,1, 'Name1',1,1, 'Name2',0,1);

So here is what I have:

1) Form for uploading file(s) (working)
2) Logs the file(s) into a file_upload table (working)
3) During upload it also needs to parse out the sections into several different tables using the auto-incremented ID(s) from step one to keep track of them. (NOT working)

Although the form is there for uploading the file, I'm not sure how to apply the function to it while it's uploading. So far I have:

Code:
<form class="UploadForm" method="POST" enctype="multipart/form-data" action="parse_csv.php">
Upload CSV files only:<br>
<input type="file" name="UserfileProbe[]" multiple="multiple" /><br>
<input type="submit" value="Upload All" />
</form>

Code:
$Filename = $ServerRoot.$UploadFolder."/Probe/XB2X/XB2X_L7563409_10_7E_p3p4_nTG_Lint.csv";
// Full file path required: use $ServerRoot.$UploadFolder to obtain

function parseProbeCSV($Filename) {
	ini_set('auto_detect_line_endings', TRUE);
	$fh = fopen($Filename, 'r'); //open the file for reading
	$s = 0; //set section marker

	while (FALSE != ($row = fgetcsv($fh))): //iterate the file
		$row = array_map('trim', $row); //get rid of trailing spaces

		// test for which section we are in and change
		// migrate section and advance the pointer
		if ($s == 0 && $row[0] == "A") $s = 1;

		if ($s == 1 && $row[0] == 'A Param'):
			$s = 2;
			$row = array_map('trim', fgetcsv($fh));
		endif;

		if ($s == 2 && $row[0] == 'Bin'):
			$s == 3;
			$header = true; //set the header to capture the field names in a special variable
		endif; 

		if ($header):
			$data['section'][$s]['header'] = $row; //capture the field names
			$header = false;
		else:
			if ($s == 3):
				$data['section'][$s]['data'][] = $row; //capture data in section 3
			else:
				$data['section'][$s][] = $row; //capture data in sections 1 and 2
			endif;
		endif;
	endwhile;
	fclose($fh); 

	//echo"<pre>";
	//print_r (jflip($data['section'][1]));
	$flipped = jflip($data['section'][1]);
	//echo "</pre>";
	
				$Query = "INSERT INTO dcs_testarraydump (TestData1, TestData2, TestData3, TestData4, TestData5, TestData6, TestData7) 
						  VALUES ";						
			
				$qPart = array_fill(0, count(jflip($data['section'][1])), "(?, ?, ?, ?, ?, ?, ?)");
				$Query .= implode(",",$qPart);
	echo $Query;
				$stmt = $mysqli->prepare($Query);
				$i = 1;
				foreach ($data as $item) {
					$stmt->bindParam($i++, $item['TestData1']);
					$stmt->bindParam($i++, $item['TestData2']);
					$stmt->bindParam($i++, $item['TestData3']);
					$stmt->bindParam($i++, $item['TestData4']);
					$stmt->bindParam($i++, $item['TestData5']);
					$stmt->bindParam($i++, $item['TestData6']);
					$stmt->bindParam($i++, $item['TestData7']);
				}
				
				$stmt->execute();
				
				/* // No errors but no Insert either
				$sql = "INSERT INTO dcs_testarraydump (TestData1, TestData2) 
				VALUES ('?', '?');";	

				for ($i=0; $i<sizeof($data); $i++) {
					$query = $mysqli->prepare($sql);
					$query->execute($data[$i]);
				}
				*/
}

and

Code:
// Subfunction to flip array
function jflip($array) {
	$output = array();
	$arrayKeys = array();
	$arrayValues = array();

	foreach($array as $line=>$data):
		foreach ($data as $column=>$item):
			if(empty($item)) break;
			if ($column == 0):
				$arrayKeys[$line] = trim($item);
			else:
				$arrayValues[$column - 1][$line] = $item;
			endif;
		endforeach;
	endforeach;

	foreach ($arrayValues as $key=>$data):
		foreach($data as $iKey=>$item):
			$output[$key][$arrayKeys[$iKey]] = $item; 
		endforeach;
	endforeach;
return $output;
}
 
I have not looked at the code yet but here are a few home truths that might help

1. csv files are a specified format. whoever is giving you these files needs to be told to go and get his/her house in order.

2. if they are not, in fact, csv files then it is pointless trying to jury rig a csv parser to handle them.

3. do not aggregate lots of different files in different formats into a single file and try to parse that. Far better is to build parsers for each file that you will be frequently ingesting. these only take a few minutes each. Define the input standard and define the output standard and then build a decent class to handle it for you.

In defining the output standard have an eye to the use to which you will put the data. If you are using it for graphing, then consider what your graphing app needs.

For your graphing scripts, I would seriously consider not using those wrapper functions but just addressing the jpGraph engine directly. much more sensible until you know it is all working. then if you wanted to build encapsulated handlers, do so in a class.
 
Thanks. Unfortunately the CSV files are generated automatically by testing equipment so there is nothing to be done about their format. I need to find a way to work around it.
 
as you describe them, they are not csv files.

but anyway, the solution is to create file specific parsers. provide the input specification for each file and the desired output specification and we can help.
 
Thank you. Is there somewhere secure where I can send a couple samples? In the meantime, I'll talk with the engineers to see if there is anything on their end that can be done but somehow I doubt it. I don't think they are able to generate three separate files from the same testing process.

Also, I don't put this stuff into a function until it's working but I posted it that way to show what I needed. I've never worked directly with classes, though, so I'm not sure how. Anyway, thanks again.
 
I spoke with the engineers this morning and they are adamant that they need a single CSV file so I have no cloice. Rather than trying to do it all at once, though, it seems to me that it will need different functions to grab the pieces, flip them or not, and insert them while at the same time knowing which set of data belongs with which physical file. I'll go through what you already sent and see what I can come up with.

A quick related question, to use your jflip() function inside another function, does a copy of it need to be in the function?

Thank you.
 
Code:
to use your jflip() function inside another function

nope. so long as the function itself is in the global scope it is globally accessible.

Code:
Is there somewhere secure where I can send a couple samples?

I'm not difficult to find if you search for me. Check out the 'about' page of my site at rathercurious.com (this site rightly prohibits publishing email addresses). I am a technology lawyer by trade and will happily treat anything you send me as confidential.
 
I had actually found you there but there is a message that the domain name expired a few data ago.
 
Okay, I think I have an idea that might work. I'll create a relatively simply function into which a few variables will be passed so that it can fetch out only a single section of the .CSV at a time, probably as a simple array.

Code:
parseCSV($Filename, $DataStart, $DataEnd, $Cols, $Flip);

// First section - this data is vertical so $DataStart and $DataEnd are ROW names
$Filename = 'Name of file.csv';
$DataStart = 'A';
$DataEnd = 'xHP';
$Cols = 8;
$Flip = true;

// Second section - this data is vertical so $DataStart and $DataEnd are ROW names
$Filename = 'Name of file.csv';
$DataStart = 'A param';
$DataEnd = 'Eb param';
$Cols = 18;
$Flip = true;

// Third section - this data is horizontal so $DataStart and $DataEnd are COLUMN names
$Filename = 'Name of file.csv';
$DataStart = 'Bin';
$DataEnd = 'total';
$Cols = 8;
$Flip = false;

If the data contains more rows/cols (depending upon orientation) than specified, those in between need to be padded with 0 value (rather than being Null)

On this existing code snippet, ['section'][0] gives a meaningless title from the top row of the file; ['section'][1] gives everything else. In the CSVs, there is also one case where it needs to be able to start on the second column so the $DataStart will need to know not to necessarily look at the first column. Also, I don't need the field names, just the data.

Below is a roughly modified portion of the code you wrote as a starting point but I am not sure how to feed it the data end point.

What do you think, will it work and can you help?

Code:
function parseCSV($Filename, $DataStart, $DataEnd, $Cols, $Flip) {
	ini_set('auto_detect_line_endings', TRUE);
	$fh = fopen($Filename, 'r'); //open the file for reading
	$s = 0; //set section marker

	while (FALSE != ($row = fgetcsv($fh))): //iterate the file
		$row = array_map('trim', $row); //get rid of trailing spaces
                 if ($Flip = true):
                    $row = array_map('jflip', $row);
                 endif;
		// test for which section we are in and change
		// migrate section and advance the pointer
		if ($s == 0 && $row[0] == $DataStart) $s = 1;
		     $row = array_map('trim', fgetcsv($fh));
		endif;

		$header = true;
		if ($header):
			$data['section'][$s]['header'] = $row; //capture the field names
			$header = false;
		endif;
	endwhile;
	fclose($fh); 

	echo"<pre>";
	//print_r (jflip($data['section'][1]));
	//print_r(jflip($data['section'][1]));
	print_r($data['section'][1]);
	echo "</pre>";
}
 
I had actually found you there but there is a message that the domain name expired a few data ago.
Rathercurious.net. Not .com. My bad.
 
Thank you, I'll have to look it up tomorrow when I have the files available. In the meantime, I've played around a bit with my idea above and it seems that it will work if only I can figure out how to tell it where to look for the end of the section.

Something like this which, of course, isn't working yet:

Code:
parseCSV($Filename, 'A', 'xHP', 8, $Flip=true);

Code:
function parseCSV($Filename, $DataStart, $DataEnd, $Cols, $Flip) {
	ini_set('auto_detect_line_endings', TRUE);
	$fh = fopen($Filename, 'r'); //open the file for reading
	$st = 0; //set start section marker
	$en = 0; //set end section marker

	while (FALSE != ($row = fgetcsv($fh))): // iterate the file
		$row = array_map('trim', $row); // get rid of trailing spaces
		if ($Flip == true):
			$row = array_map('jflip', array($row)); // flip vertical data to horizontal
		endif;

		// test for which section we are in and change
		// migrate section and advance the pointer
		if ($st == 0 && $row[0] == "$DataStart") $st = 0;
		if ($en == 0 && $row[1] == "$DataEnd") $en = 1;

			$header = true; // set the header to capture the field names in a special variable

		if ($header):
			$data[$st][$en]['header'] = $row; // capture the field names
			$header = false;
		endif;
	endwhile;
	fclose($fh); 

	echo"<pre>";
	print_r($data[0]);
	echo "</pre>";
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top