Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular Expressions

Status
Not open for further replies.

PCHomepage

Programmer
Feb 24, 2009
609
US
I'm trying to parse the apache.log file and have a function working that does so, but I know little about regex and could not parse it directly. I reviewd and tried just about every posting I could find here and on other sites but nothing worked. Please see $pattern below and can tell me how to parse each section without the ugly work-arounds? This is for the local development copy of the log which is truncated (without the last two columns) but I would like to use the same code to also parse the live NCSA combined log format. Any help is appreciated.

Code:
[COLOR=gray]function ParseLocalToScreen($path) {
	global $output;
	// Parses the local Windows Apache development Log Format lines:[/color]
	// [bold]REF RAW LOG: 127.0.0.1 - - [27/Apr/2014:15:00:24 -0700] "GET / HTTP/1.1" 200 3051[/bold]
	// [bold]REF PARSED OUTPUT: 127.0.0.1, -, -, 2014-04-27 15:00:24, GET, /, HTTP/1.1, 200, 3051[/bold]
	[bold]$pattern = '/^(\S+)\s '; // Remote Host
	$pattern .= '([^\s]+) '; // Log Name
	$pattern .= '([^\s]+) '; // User
	$pattern .= '\[(\d+)\/(\w+)\/(\d+):(\d{1,2}:\d{1,2}:\d{1,2} '; // Datetime WORKS SO-SO
	$pattern .= '?[\+\-]?\d*)\] "(.*)/';[/bold] // Remainder
	
	[COLOR=gray]if (is_readable($path)) :
		$fh = fopen($path,'r') or die($php_errormsg);
		while (!feof($fh)) :
			$s = fgets($fh);
			if (preg_match($pattern,$s,$matches)) :
				[bold]list($whole_match, $remote_host, $logname, $user, $day, $month, $year, $time, $remainder) = $matches;[/color]
				$month = date('m', strtotime($month)); // Converts short month to numeric
				$time = trim(substr($time,0,-6)); // Removes -0800 offset
				$replacements = array(' ', '"');
				$remainder = str_replace($replacements, ', ', $remainder); // Removes extra space and quote
			endif;
			// REGEX NOT WORKING: remove extra field, build datetime and other output for MySQL
			$output .= str_replace(", , ", ", ", "$remote_host, $logname, $user, $year-$month-$day $time, $remainder<br>\n");[/bold]
		endwhile;
		[COLOR=gray]fclose($fh);
		echo $output;
	else : 
		echo "Cannot access log file!";
	endif;
}[/color]
 
Hi

Sorry, no time for detailed analysis now, just a quick question. You know that [tt]strtotime()[/tt] is able to parse that date format ?
Code:
Interactive mode enabled

php > var_dump(date('c', strtotime('27/Apr/2014:15:00:24 -0700')));
string(25) "2014-04-28T01:00:24+03:00"


Feherke.
feherke.ga
 
Yes, thank you, and no real rush. I did know that but I was expecting the regex to provide the date as a single variable using list() but instead it gives it in bits and pieces, which is the main issue here. When I first wrote the function it used a pattern something like:

PHP:
$pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+) ([0-9\-]+) "(.*)" "(.*)"$/';

and to assign variables, it was using:

PHP:
list($whole_match, $remote_host, $logname, $user, $date_time, $method, $request, $protocol, $status, $bytes, $referer, $user_agent) = $matches;

. . . where $date_time was providing what was needed for formatting. Somehow I broke it but finally realized that each bit of the date and time is now being broken down to individual variables so to get it to work I simply put them back together in the needed order but it's inelegant and ugly that way!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top