Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Accessing large number of files from a directory

Status
Not open for further replies.

tewari68

Programmer
Jan 25, 2005
87
US
Hi,
I have a php function to parse a given xml file and make a database entry.
My problem is I have to read files from a directory which contains more than 100k files, all xml, and parse them one by one and enter records in the database.
When I run the php script it times out after a certain time.
I tried increasing the timeout values in php.ini and also increased the memory to 128MB still unable to parse all the files from the directory.
I think using nice and find might to the trick, however I am not that advanced unix user so don't know how to use nice and find together to solve the problem.
Appreciate if anyone can provide me a solution.

Thanks,
tewari
 
i don't see an issue with using php, personally.

why not post your code and we can see whether it is tweakable to do what you need.
 
Hi,
Here is my code
Code:
<?

$dir = "/usr/local/xmlfiles";
// open directory and parse file list
if (is_dir($dir))
{
	if ($dh = opendir($dir))
	{
		// iterate over file list
		while (($filename = readdir($dh)) !== false)
		{
			$hotelinfo_key = array();
			$hotelinfo_value = array();
			$images_val = array();
			$description_header = array();
			$description_value = array();
			echo "Processing file - ".$filename."<BR>";
			if (($filename != ".") && ($filename != ".."))
			{
				$load_file = $dir."/".$filename;
				$xmlobj = simplexml_load_file($load_file);
				$count++;
				foreach($xmlobj->children() as $child) {
					if($child->getName() != "Images" & $child->getName() != "Descriptions")
					{
						$hotelinfo_key[] .= mysql_escape_string($child->getName());
						$hotelinfo_value[] .= "'".mysql_escape_string($child)."'";
					}
					foreach($child->children() as $subchild) {
						
					if($subchild->getName() == "Image")
					{
        				$images_val[] .= $subchild;
					}
					foreach($subchild->children() as $k=>$v){
						if($k == "Name")
					$description_header[] .= mysql_escape_string($v);
					if($k == "Value")
					$description_value[] .= mysql_escape_string($v);
					}
					}
				}
$qry_hotelinfo =  "Insert  into HotelInfo (".implode(",", $hotelinfo_key).") values (".implode(",", $hotelinfo_value).")";
$results = mysql_query($qry_hotelinfo) or die($qry_hotelinfo.mysql_error());
for($i=0;$i<count($images_val);$i++)
{
$qry_hotelimages = "Insert into HotelImages (HotelID, HotelImage) values (".$hotelinfo_value[0].", '".$images_val[$i]."')";
$R = mysql_query($qry_hotelimages) or die($qry_hotelimages.mysql_error());
}
for($i=0;$i<count($description_header);$i++)
{
$qry_hoteldesc = "Insert into HotelDescription (HotelID, DescriptionName, DescriptionValue) values (".$hotelinfo_value[0].", '".$description_header[$i]."', '".$description_value[$i]."')";
$Results = mysql_query($qry_hoteldesc) or die($qry_hoteldesc.mysql_error());
}
}
}
closedir($dh);
}
}
?>
Appreciate any help
Thanks
 
If you have root access (it appears you do) and the ability to create cronjobs on the server, you could set the script to run on a periodic basis (cronjob). I'd set the job to run in a directory that's not part of the web root. Also, I'm assuming that after the files are processed, they can be deleted or moved - they should be. That way, the script won't have to deal with already processed files, thus limiting the processing time during the next wake-up of the script.

You could limit the number of files processed during each wakeup to limit the load on the server.

i.e.
a- set_time_limit(0); // at beginning of script - careful - code could hang if the code's not resilient
1- Get files
1a- If files retrieved == 0 exit
2- Loop over files
2a- Process file
3- Delete or move processed file (can't do this if running from web)
3a- Bump File Count
4- Is File count limit reached?
4a- If yes exit
5- Loop

If you're running this from the web, you'll need to add some additional logic to determine which files were processed previously and to skip those. Also, you'll need to setup some type of method to delete/move files (separate from the script).


 
i've not tweaked this code much. i'm making an assumption that your iterative sql is required and there are no errors.

Code:
<?

$dir = "/usr/local/xmlfiles";
// open directory and parse file list
if (is_dir($dir)){
    $dh = opendir($dir) or die ('cannot open directory');

    // iterate over file list
    while (($filename = readdir($dh)) !== false){
        $hotelinfo_key = array();
        $hotelinfo_value = array();
        $images_val = array();
        $description_header = array();
        $description_value = array();
        echo "Processing file - ".$filename."<BR>";
        if (($filename != ".") && ($filename != "..")){
            set_time_limit(20); //extend the time out for another 20 seconds.
			$load_file = $dir."/".$filename;
            $xmlobj = simplexml_load_file($load_file);
            $count++;
            foreach($xmlobj->children() as $child) {
                if($child->getName() != "Images" & $child->getName() != "Descriptions"){
                    $hotelinfo_key[] .= mysql_escape_string($child->getName());
                    $hotelinfo_value[] .= "'".mysql_escape_string($child)."'";
                }
                foreach($child->children() as $subchild) {
	                if($subchild->getName() == "Image"){
	                    $images_val[] .= $subchild;
	                }
				}
                foreach($subchild->children() as $k=>$v){
                    if($k == "Name"){
               			$description_header[] .= mysql_escape_string($v);
					} elseif($k == "Value"){
               		 	$description_value[] .= mysql_escape_string($v);
					}
                }
            } // end of the foreach
			$qry_hotelinfo =  "	Insert  
							into HotelInfo 
							(".implode(",", $hotelinfo_key).") 
							values 
							(".implode(",", $hotelinfo_value).")";
							
			$results = mysql_query($qry_hotelinfo) or die($qry_hotelinfo.mysql_error());
			for($i=0;$i<count($images_val);$i++){
				$qry_hotelimages = "	Insert 
									into HotelImages 
									(HotelID, HotelImage) 
									values 
									(".$hotelinfo_value[0].", '".$images_val[$i]."')";
				$R = mysql_query($qry_hotelimages) or die($qry_hotelimages . "<br/>" . mysql_error());
			}
			for($i=0;$i<count($description_header);$i++){
				$qry_hoteldesc = "	Insert 
									into HotelDescription 
									(HotelID, DescriptionName, DescriptionValue) 
									values 
									(".$hotelinfo_value[0].", '".$description_header[$i]."', '".$description_value[$i]."')";
				$Results = mysql_query($qry_hoteldesc) or die($qry_hoteldesc.mysql_error());
			}
		} //close the if (filename) loop
	} //close the while loop
	closedir($dh);
}
?>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top