Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing Odd CSV File

Status
Not open for further replies.

PCHomepage

Programmer
Feb 24, 2009
609
US
Generally, parsing and inserting a CSV file into MySQL (I'm using object oriented mysqli) is fairly straightforward but I have some new files with a very odd format that need parsing and I have no clue how to do it because it's in sections, each require inserting into (or updating an existing) entry. The first line contains the file name which can be ignored and is always prefixed with the word "product name" or something like that. Sorry, I don't have the file here so I'm paraphrasing.

Then there is a series of names and values as a pair of columns like:

Name1, value1
Name2, value2
Name3, value3
etc.

which continues for a way in similar fashion followed by a space (empty line), then another similar one followed by another space. These two sections always seem to have two columns with the title in the left column and the data values next to it in the right column. As I understand it, these will always have the same number of columns so I can create a table that has the column names of the left column values, then it needs to insert the right column values appropriately. If the particular file has already been uploaded, this same data would need to be replaced, not inserted, and the file name is stored in another table.

Then the rest of the data is in regular header, row and column but I need the data after the header row, not the header itself although the first header value can be used to determine where the values begin. Although the number of values in each section may vary, each one seems to begin with something specific that should help discern where one section ends and the next begins if only I knew how to do it! Any ideas?
 
The data would look something like this although each section is much longer (particularly the last one) and each section goes into its own table. I do need to check if the file name already exists and, if it does, to run an update but if it doesn't, to insert. The file names themselves are in another table and then referenced by ID everywhere else. Thank you.

Code:
File name,A999999_4_9E_p3p4_nTG_Test,,,,,,
,,,,,,,
A,0.796447376,,,,,,
Ba,0.00234966,,,,,,
Bb,0,,,,,,
,,,,,,,
A param,123575.0956,,,,,,
C0 param,7.629658594,,,,,,
u0 param,83.01084779,,,,,,
,,,,,,,
Data,,,,,,,
Testsrc,Testp,,,,,,
Name1,Name2,Name3,Name4,Name5,Name6,Name7,Total
1,1,1,0,0,2,0,2
2,0,0,0,1,0,1,1
3,0,0,0,0,0,0,0
4,0,0,1,1,0,2,2
5,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0
8,1,0,0,0,1,0,1
9,0,0,0,0,0,0,0
 
the sections are delineated by the blank lines?
 
Generally, yes, but I would rather look for some keyword at the beginning and ending of the first two sections to be sure.

Using the sample above as a basic guide and ignoring the first row entirely, it would start with the next row that begins with "A" and ends with the row that begins with "Bb" and has only one value each.

Section 2 begins with "A param" and ends with "u0 param" and has only one value each.

Section 3 begins with "Name1" and runs to the end of the file and has eight values.

So far I have only a single sample for testing purposes so I suppose I can write into the documentation any specific that it needs to work properly.

All values, by the way, are unsigned integers so there is no need for anything special in that regard.
 
Code:
$fh = fopen('path/to/some/file.csv', 'r'); //open the file for reading
$s = 1; //set section marker
while (FALSE != ($row = fgetcsv($fh)): //iterate the file
 $row = array_map('trim', $row); //get rid of trailing spaces
 /*
 * test for which section we are in and change
 */
 /*
 * migrate section and advance the pointer
 */
 if ($s == 1 && $row[0] == 'A Param'):
   $s = 2;
   $row = array_map('trim', fgetcsv($fh));
 endif;
 if ($s == 2 && $row[0] == 'Name1'):
   $s == 3;
   $header = true; //set the header to capture the field names in a special variable
 endif;  
 if ($header):
   $data['section'][$s]['header'] = $row; //capture the field names
   $header = false;
 else:
   if ($s == 3):
     $data['section'][$s]['data'][] = $row; //capture data in section 3
   else:
     $data['section'][$s][] = $row; //capture data in sections 1 and 2
   endif;
  endif;
endwhile;
fclose($fh);
[/close]

however if you are in charge of both ends of this transfer then consider:

1. writing the data directly to a remote database.  you can connect to remote databases using php or even make it secure with ssh first.
2. transfer the data in a meaningful structure using something like json.
 
Thank you! I'll give it a try. I am also not totally adverse to loading the raw data whole into a temp table, then extracting it out again using SQL with which I am more familiar but your code is closer to what I am trying to do.

As for the data, it comes from an automated process with the only output option as CSV so there is not much choice in that regard but it is used only internally so security isn't a prime issue.
 
Your code makes good sense to me and I see nothing wrong but all I've been able to achieve is a server error. Also, the first section to grab is actually (usually) the third row which contains "A"; the second begins with "A param"; the third with "Name1".

The first two sections are actually a single row of data each but they are turned on their sides so somehow need to be flipped to get the values but not the captions.

I'll ultimately be sending the file through an upload form with up to three at a time so once this is working, I'll probably convert it to a function.

Code:
$fh = fopen('path/to/some/file.csv', 'r'); //open the file for reading
$s = 1; //set section marker

while (FALSE != ($row = fgetcsv($fh)): //iterate the file
$row = array_map('trim', $row); //get rid of trailing spaces

// test for which section we are in and change
// migrate section and advance the pointer
if ($s == 1 && $row[0] == 'A Param'): 
	$s = 2;
	$row = array_map('trim', fgetcsv($fh));
endif;

if ($s == 2 && $row[0] == 'Name1'):
	$s == 3;
	$header = true; //set the header to capture the field names in a special variable
endif; 

if ($header):
	$data['section'][$s]['header'] = $row; //capture the field names
	$header = false;
else:
	if ($s == 3):
		$data['section'][$s]['data'][] = $row; //capture data in section 3
	else:
$data['section'][$s][] = $row; //capture data in sections 1 and 2
	endif;
endif;

endwhile;

fclose($fh);
 
there is a close bracket missing (perils of typing straight into the edit box)

Code:
while (FALSE != ($row = fgetcsv($fh))[red])[/red]: //iterate the file
 
Much better, thank you! Well, it's not crashing anyway but it is giving no values either - just a blank screen but to me this is a form of success after so many server errors with every other attempt!

Anyway, I fixed the error and tried to rearrange the sections to suit the data but I'm not quite clear on what section 3 is searching:

Code:
$fh = fopen("$filename", 'r'); //open the file for reading
$s = 1; //set section marker

while (FALSE != ($row = fgetcsv($fh))): //iterate the file
$row = array_map('trim', $row); //get rid of trailing spaces

// test for which section we are in and change
// migrate section and advance the pointer
if ($s == 1 && $row[0] == 'A'): 
	$s = 2;
	$row = array_map('trim', fgetcsv($fh));
endif;

if ($s == 2 && $row[0] == 'A param'):
	$s == 3;
	$header = true; //set the header to capture the field names in a special variable
endif; 

if ($header):
	$data['section'][$s]['header'] = $row; //capture the field names
	$header = false;
else:
	if ($s == 3):
		$data['section'][$s]['data'][] = $row; //capture data in section 3
	else:
$data['section'][$s][] = $row; //capture data in sections 1 and 2
	endif;
endif;

endwhile;

fclose($fh);
 
i think you need to go back to the A Param and Name1 separators. the assumption is that you are in section 1. it is only if you are in section 1 and hit A Param , that you move to section 2. etc.

 
Okay, I'll do that although there are a couple lines at the top of the file that are unneeded so it seemed necessary for it to know where to look. Thank you.
 
I see
Code:
$fh = fopen('path/to/some/file.csv', 'r'); //open the file for reading
$s = 0; //set section marker
while (FALSE != ($row = fgetcsv($fh))): //iterate the file
$row = array_map('trim', $row); //get rid of trailing spaces
/*
* test for which section we are in and change
*/
/*
* migrate section and advance the pointer
*/
if($s ==0 && $row[0]=="A") $s = 1;
if ($s == 1 && $row[0] == 'A Param'):
$s = 2;
$row = array_map('trim', fgetcsv($fh));
endif;
if ($s == 2 && $row[0] == 'Name1'):
$s == 3;
$header = true; //set the header to capture the field names in a special variable
endif; 
if ($header):
$data['section'][$s]['header'] = $row; //capture the field names
$header = false;
else:
if ($s == 3):
$data['section'][$s]['data'][] = $row; //capture data in section 3
else:
$data['section'][$s][] = $row; //capture data in sections 1 and 2
endif;
endif;
endwhile;
fclose($fh);
 
Thank you again! It seems to be working very well with no errors. I do still have some questions, though, if you're not adverse to it.

Sections one and two have only a header and a single row of data each but they are flipped vertically in the CSV file. I tried adding an array_flip() as shown in red below but it seemed to do so oddly. Maybe it's correct but it didn't appear to be in the print_r() screen dump.

Code:
if ($header):
		$data['section'][$s]['header'] = $row; //capture the field names
		$header = false;
	else:
		if ($s == 3):
			$data['section'][$s]['data'][] = $row; //capture data in section 3
		else:
			$data['section'][$s][] = [COLOR=red]array_flip($row)[/color]; //capture data in sections 1 and 2
		endif;
	endif;

I am presuming that it is here in this conditional where I need to add my mysqli inserts and/or updates. Is that correct? I am not sure how to reference sections one and two individually since they each get their own database table nor do any of the sections need the headers inserted.

Also, the array is showing the couple extra lines of nonsense at the top of the file and those between the sections. Is it possible to start AND stop it based on specific keywords? The last section, fortunately, runs all the way to the end and needs no specific end and, in fact, it may vary in length so an end point isn't desired.

Slightly off topic, but is there a way to add an error if fopen() fails to read the file?
 
so you want to flip sections 1 and 2.
use a function like this (you could probably make this more efficient, perhaps using only one cascading loop rather than two). pass the relevant section as an argument.

e.g
Code:
print_r(jflip($data['section'][1]);
Code:
function jflip($array){
	$output = array();
	$arrayKeys = array();
	$arrayValues = array();
	foreach($array as $line=>$data):
		foreach ($data as $column=>$item):
			if(empty($item)) break;

			if ($column == 0):
				$arrayKeys[$line] = trim($item);
			else:
				$arrayValues[$column - 1][$line] = $item;
			endif;
		endforeach;
	endforeach;
	foreach ($arrayValues as $key=>$data):
		foreach($data as $iKey=>$item):
				$output[$key][$arrayKeys[$iKey]] = $item; 
		endforeach;
	endforeach;
	return $output;
}
 
Thank you for this! I've only just gotten back to it but unfortunately there seems to be something amiss since, when I try to call it, all I get is a blank screen. I don't see anything obviously wrong but I'm still looking.

Once it's working, would I be able to run this function inside the main function so that it can write the values to their respective tables all together?
 
I spoke too soon as I forgot that I was trying to run it static in the browser for testing rather than as a function so with the function line remarked out, it does indeed work! It seems to be outputting the captions too which it needs to skip. Can that be done? In other words, I need three separate data streams of values only to feed to three different tables so they have to somehow be individual.
 
print_r the output and then show us the equivalent in the format that you want it.

 
There are over 1000 lines so I'll try some sections. The first bit in red, for example, shows the unneeded lines at the beginning and the remainder seems to be caption/value for the next two sections and is also showing the empty rows between sections.

Code:
Array
(
    [section] => Array
        (
            [0] => Array
                (
                    [COLOR=red][0] => Array
                        (
                            [0] => Device
                            [1] => Description of the device
                            [2] => 
                            [3] => 
                            [4] => 
                            [5] => 
                            [6] => 
                            [7] => 
                        )

                    [1] => Array
                        (
                            [0] => 
                            [1] => 
                            [2] => 
                            [3] => 
                            [4] => 
                            [5] => 
                            [6] => 
                            [7] => 
                        )[/color]

                )

            [1] => Array
                (
                    [0] => Array
                        (
                            [0] => A
                            [1] => 0.796447376
                            [2] => 
                            [3] => 
                            [4] => 
                            [5] => 
                            [6] => 
                            [7] => 
                        )

                    [1] => Array
                        (
                            [0] => Ba
                            [1] => 0.00234966
                            [2] => 
                            [3] => 
                            [4] => 
                            [5] => 
                            [6] => 
                            [7] => 
                        )

                    [2] => Array
                        (
                            [0] => Bb
                            [1] => 0
                            [2] => 
                            [3] => 
                            [4] => 
                            [5] => 
                            [6] => 
                            [7] => 
                        )

                    [3] => Array
                        (
                            [0] => Bc
                            [1] => 0
                            [2] => 
                            [3] => 
                            [4] => 
                            [5] => 
                            [6] => 
                            [7] => 
                        )

                    [4] => Array
                        (
                            [0] => D
                            [1] => 0.159623587
                            [2] => 
                            [3] => 
                            [4] => 
                            [5] => 
                            [6] => 
                            [7] =>

I don't know if this is quite right bit since the first two sets of data needed have only a single row each, I was expecting something more like this:

Code:
Array
(
    [section] => Array
        (
            [0] => Array

                (
                   [1] => 0.796447376
                   [2] => 0.00234966
                   [3] => 0.159623587

Basically I need them into a format where the data only can be inserted into three separate tables
 
Reviewing the functionality more, I see that jflip() seems to actually be flipping every section into when appears to be a long single-column array. However, it's actually only the first two sections of data that need to be flipped while the third does not, ignoring the extraneous lines and captions already mentioned, into three separate and distinct arrays for inserting into three separate database tables. Perhaps I am misunderstanding the on-screen array output but that's how it seems to me although I'm tempted to try creating a test table and dump it in to see what it does!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top