Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to search string in a directory? 2

Status
Not open for further replies.

spicymango

Programmer
May 25, 2008
119
CA
Hi,

I need to write a program which go over all the files in my directory/sub directories and search for a string and echo the line where it exits,... has anyone writen a programm which does somthing like this? Is it possible to that in php ............ i know u can do quite easily in unix
 
Hi,

I once had to perform a similar task. Heres my example:

Code:
<?php
function recur($path,$search_pattern) {
	$handle = opendir($path);
	while($file = readdir($handle)) {
		if ($file == '.' || $file == '..') { continue; }
		if(is_dir("$path/$file")) {
			recur("$path/$file",$search_pattern);
		}
		else {
			#####			
			$var=file_get_contents("$path/$file");
			$var=explode("\n", $var);
			foreach ($var as $line=>$data) {
				if (preg_match($search_pattern,$data)) {
					echo "Match Found > Line $line > $path/$file<br>";
				}
			} 
			#####			
		}
	}
	closedir($handle);
}

$path = getcwd();
$search_pattern = '/^searchpattern/';
recur($path,$search_pattern);
?>

Chris
 
Just a note. I didn't need to use a reccurssive function when I performed this task, therefore the reccurssive nature of the example above is mostly untested and may not work as expected, although can be quickly fixed.
 
Just 1 final note, as I have tested. Seems to work as expected. I changed:

echo "Match Found > Line $line > $path/$file<br>";

to:

echo "Match Found > Line " . ($line+1) . " > $path/$file<br>";
in order to print the actual line number (as $line will read line 1 as line 0).

Also, its most likely you want to remove the caret ^ symbol from $search_pattern ( $search_pattern = '/searchpattern/'; )

Chris

 
could you not just use grep with a system call?
 
I tried it works good. But only in the current folder or directory, if there are subdirectories it does not go into subdirectories
 
It should work fine. Have you changed $path = getcwd();, to e.g. $path = 'path/to/first/directory'; ? And you've removed the ^ in $search_pattern ? If you've made any notable changed, post your code here.

Chris
 
i tried again it work great.

But if directory is big it is very slow process. It times out after 30 sec.

unix grep is much faster. but i don't have unix box so i was hoping to achieve it using php
 
you might find this code a bit quicker. it won't time out either but it might well get upset if you run out of available memory.

Code:
<?php
//this example will search recursively downwards from the current directory for the case insensitive string 'pwd' 

new folderGrep('.', '/pwd/i');

class folderGrep{
	
	private $debug = false; //change to true for verbose output
	private $ignoreHidden = true; //change if you want to include files prefixed with '.'
	
	public function __construct($dir, $pattern){
		$this->pattern = $pattern;
		$this->ignore = array('.' ,'..');
		$this->memLimit = $this->returnBytes(ini_get('memory_limit'));
		$this->dir = $this->file = $this->lineNum = $this->line = array();
		$this->start = microtime(true);
		$this->scanDir(realpath($dir));
		$this->displayResults();
	}
	
	private function scanDir($dir){
		set_time_limit(60);
		$dir = trim($dir);
		if (substr($dir, -1) !== '/'){
			$dir .= '/';
		}
		
		if (is_dir($dir)){
			if (is_readable ($dir) ){
				$this->log ($dir, 'opening directory');
				$dh = opendir($dir);
				$this->log($dir, 'directory open');
				$this->log($dir, 'about to scan directory');
				while (FALSE !== ($file = readdir($dh))){
					if (!in_array($file, $this->ignore)){
						if (substr($file,0,1) == '.' && $this->ignoreHidden){
							continue;
						}
						if (is_dir($dir . $file)){
							$this->log($dir.$file, 'recurse as directory');
							$this->scanDir($dir.$file);
						} else {
							if (is_readable($dir.$file)){
								$this->log($dir.$file, 'File is readable');
								if ($this->tooBig($dir.$file)){
									$this->log($dir.$file, 'File is too big to be read in one gulp');
									$fh = fopen($dir.$file, 'rbt');
									$this->log($dir.$file, 'opening file for line by line reading');
									$cnt = 0;
									while (!feof($fh)){
										$line = fgets($fh);
										if (preg_match($this->pattern, $line)){
											$this->log($dir.$file, 'Pattern match found');
											$this->output($dir,$file, $cnt, $line);
										} else {
											$this->log($dir.$file, 'Patern match NOT found');
										}
										$cnt++;
									}
									unset($line);
								} else {
									$lines = file($dir.$file);
									$results = preg_grep($this->pattern, $lines);
									if (is_array($results)){
										foreach ($results as $key=>$result){
											$this->output($dir, $file, $key, $result);
										}
									}
									unset ($lines);
									unset($results);
								}
							}
						}
					} 
				}
				
			} else {
				$this->log($dir , "cannot read directory", 'error');
			}
		} else {
			$this->log($dir, "Not a Directory", 'error');
		}
		
	}
	private function log ($item, $message, $type=null){
		if ($this->debug){
			echo "<p><pre>$item \t $message</pre></p>";
		} else {
			if ($type == 'error'){
				echo "<p><pre>$item \t $message</pre></p>";
			}
		}
	}
	private function output($dir, $file, $lineNum, $line){
		//split into multiple arrays for easy sorting
		$this->dir[] = $dir;
		$this->file[] = $file;
		$this->lineNum[] = $lineNum;
		$this->line[] = $line;
	}
	
	private function tooBig($file){
		if (function_exists('memory_get_usage')){
			$curMemory = memory_get_usage(true);
			if ($curMemory + filesize($file) > $this->memLimit * 0.8){
				return true;
			}	else {
				return false;
			}
		} else {
			if (filesize($file) > ($this->memLimit * 0.5)){
				return true;
			} else {
				return false;
			}
		}
	}
	private function returnBytes($val){
	    $val = trim($val);
	    $last = strtolower(substr($val, -1));
	    switch($last) {
	        // The 'G' modifier is available since PHP 5.1.0
	        case 'g':
	            $val *= 1024;
	        case 'm':
	            $val *= 1024;
	        case 'k':
	            $val *= 1024;
	    }
	
	    return $val;
	}
	private function displayResults(){
		//first order the arrays
		if (count($this->dir) ===0 ){
			echo "No results found";
		} else {
			array_multisort($this->dir, SORT_ASC, SORT_STRING, $this->file, SORT_ASC, SORT_STRING, $this->lineNum, SORT_ASC, SORT_NUMERIC, $this->line);
			$this->stop = microtime(true);
			echo "This operation took " . ($this->stop - $this->start) . " seconds to complete <hr/>";
			//now iterate
			echo '<table width="800px" border="1">';
			echo <<<HTML
		<tr>
			<th scope="col" width="200px">Directory</th>
			<th scope="col" widt="100px">File</th>
			<th scope="col" width="50px">Line Num</th>
			<th scope="col" >Line</th>
		</tr>
HTML;
			$cDir = '';
			$cFile = '';
			foreach ($this->dir as $k=>$d){
				if ($cDir == $d){
					$_d = '&nbsp';
				} else {
					$_d = $cDir = $d;
				} 
				if ($cFile == $this->file[$k]){
					$_f = '&nbsp;';
				} else {
					$_f = $cFile = $this->file[$k];
				}
				$line = htmlentities(nl2br(wordwrap($this->line[$k], 250)));
				echo <<<HTML
		<tr>
			<td>$_d</td>
			<td>$_f</td>
			<td>{$this->lineNum[$k]}</td>
			<td>$line</td>
		</tr>
HTML;
			}
			echo "</table>";
			if (!empty($this->log)) var_dump($this->log);
		}
	}
}
?>
 
jpadie your program is a beauty ..

I will try to understand the logic and may ask you few questions later !!
 
I want to loop through a text file and search all the strings that I have in the file in the directory. So I added some code to your code. Unless I delete previous object before creating new one.

Code:
$f=fopen("welcome.txt","r") or exit("Unable to open file!");

while (!feof($f)) 
{ 

$x=fgets($f); 
new folderGrep('.', '/'.$x.'/i');
}
fclose($f);

But this way I will keep creating objects, suppose their are 1000 strings meaning I will create 1000 objects. I am not a OOP person, I was wondering if so many objects will create memory issue.

FULL CODE
Code:
<?php
//this example will search recursively downwards from the current directory for the case insensitive string 'pwd' 

$f=fopen("welcome.txt","r") or exit("Unable to open file!");

while (!feof($f)) 
{ 

$x=fgets($f); 
new folderGrep('.', '/'.$x.'/i');
}
fclose($f);

class folderGrep{
    
   private $debug = false; //change to true for verbose output
    private $ignoreHidden = true; //change if you want to include files prefixed with '.'
    
    function __destruct() {
        echo "Person5 Object Released\n";
    }

    
    
    public function __construct($dir, $pattern){
        $this->pattern = $pattern;
        $this->ignore = array('.' ,'..');
        $this->memLimit = $this->returnBytes(ini_get('memory_limit'));
        $this->dir = $this->file = $this->lineNum = $this->line = array();
        $this->start = microtime(true);
        $this->scanDir(realpath($dir));
        $this->displayResults();
    }
    
    private function scanDir($dir){
        set_time_limit(60);
        $dir = trim($dir);
        if (substr($dir, -1) !== '/'){
            $dir .= '/';
        }
        
        if (is_dir($dir)){
            if (is_readable ($dir) ){
                $this->log ($dir, 'opening directory');
                $dh = opendir($dir);
                $this->log($dir, 'directory open');
                $this->log($dir, 'about to scan directory');
                while (FALSE !== ($file = readdir($dh))){
                    if (!in_array($file, $this->ignore)){
                        if (substr($file,0,1) == '.' && $this->ignoreHidden){
                            continue;
                        }
                        if (is_dir($dir . $file)){
                            $this->log($dir.$file, 'recurse as directory');
                            $this->scanDir($dir.$file);
                        } else {
                            if (is_readable($dir.$file)){
                                $this->log($dir.$file, 'File is readable');
                                if ($this->tooBig($dir.$file)){
                                    $this->log($dir.$file, 'File is too big to be read in one gulp');
                                    $fh = fopen($dir.$file, 'rbt');
                                    $this->log($dir.$file, 'opening file for line by line reading');
                                    $cnt = 0;
                                    while (!feof($fh)){
                                        $line = fgets($fh);
                                        if (preg_match($this->pattern, $line)){
                                            $this->log($dir.$file, 'Pattern match found');
                                            $this->output($dir,$file, $cnt, $line);
                                        } else {
                                            $this->log($dir.$file, 'Patern match NOT found');
                                        }
                                        $cnt++;
                                    }
                                    unset($line);
                                } else {
                                    $lines = file($dir.$file);
                                    $results = preg_grep($this->pattern, $lines);
                                    if (is_array($results)){
                                        foreach ($results as $key=>$result){
                                            $this->output($dir, $file, $key, $result);
                                        }
                                    }
                                    unset ($lines);
                                    unset($results);
                                }
                            }
                        }
                    } 
                }
                
            } else {
                $this->log($dir , "cannot read directory", 'error');
            }
        } else {
            $this->log($dir, "Not a Directory", 'error');
        }
        
    }
    private function log ($item, $message, $type=null){
        if ($this->debug){
            echo "<p><pre>$item \t $message</pre></p>";
        } else {
            if ($type == 'error'){
                echo "<p><pre>$item \t $message</pre></p>";
            }
        }
    }
    private function output($dir, $file, $lineNum, $line){
        //split into multiple arrays for easy sorting
        $this->dir[] = $dir;
        $this->file[] = $file;
        $this->lineNum[] = $lineNum;
        $this->line[] = $line;
    }
    
    private function tooBig($file){
        if (function_exists('memory_get_usage')){
            $curMemory = memory_get_usage(true);
            if ($curMemory + filesize($file) > $this->memLimit * 0.8){
                return true;
            }    else {
                return false;
            }
        } else {
            if (filesize($file) > ($this->memLimit * 0.5)){
                return true;
            } else {
                return false;
            }
        }
    }
    private function returnBytes($val){
        $val = trim($val);
        $last = strtolower(substr($val, -1));
        switch($last) {
            // The 'G' modifier is available since PHP 5.1.0
            case 'g':
                $val *= 1024;
            case 'm':
                $val *= 1024;
            case 'k':
                $val *= 1024;
        }
    
        return $val;
    }
    private function displayResults(){
        //first order the arrays
        if (count($this->dir) ===0 ){
            echo "No results found";
        } else {
            array_multisort($this->dir, SORT_ASC, SORT_STRING, $this->file, SORT_ASC, SORT_STRING, $this->lineNum, SORT_ASC, SORT_NUMERIC, $this->line);
            $this->stop = microtime(true);
            echo "This operation took " . ($this->stop - $this->start) . " seconds to complete <hr/>";
            //now iterate
            echo '<table width="800px" border="1">';
            echo <<<HTML
        <tr>
            <th scope="col" width="200px">Directory</th>
            <th scope="col" widt="100px">File</th>
            <th scope="col" width="50px">Line Num</th>
            <th scope="col" >Line</th>
        </tr>
HTML;
            $cDir = '';
            $cFile = '';
            foreach ($this->dir as $k=>$d){
                if ($cDir == $d){
                    $_d = '&nbsp';
                } else {
                    $_d = $cDir = $d;
                } 
                if ($cFile == $this->file[$k]){
                    $_f = '&nbsp;';
                } else {
                    $_f = $cFile = $this->file[$k];
                }
                $line = htmlentities(nl2br(wordwrap($this->line[$k], 250)));
                echo <<<HTML
        <tr>
            <td>$_d</td>
            <td>$_f</td>
            <td>{$this->lineNum[$k]}</td>
            <td>$line</td>
        </tr>
HTML;
            }
            echo "</table>";
            if (!empty($this->log)) var_dump($this->log);
        }
    }
}
?>

 
i don't think it will get intrinsically unwieldy. but why not instead pass an array of patterns into the script and adjust the loops to iterate on each pattern?
 
Yes array might be a better choice. Just a OOP question, when we create objects do they get deleted on their own when programe done executing . Or we have to put code to delete them.

 
they are like any variable. they disappear once their scope evaporates.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top