Perl and Apache Log search?

robertsdgm · May 25, 2004

Hello All
It has been quite a long time since I have used perl..
I have a huge apache access log and from this I need to print out all the unique IP address's and their access time.
How do I go about doing this if my general line output is such as below?!
Thanks
Dan
cbdw88209.utp.test.dan.com - - [20/May/2004:11:47:58 -0400] "GET /srs71/doc/js/tree-menu/menu-images
/menu_folder_closed.gif HTTP/1.1" 200 135
cbdw88209.utp.test.dan.com - - [20/May/2004:11:47:58 -0400] "GET /srs71/doc/js/tree-menu/menu-images
/menu_folder_closed.gif HTTP/1.1" 200 135
cbdw88209.utp.test.dan.com - - [20/May/2004:11:48:41 -0400] "GET /srs71/doc/srsbook_srsusr/srsusr25_
1.html HTTP/1.1" 200 4793
cbdw88209.utp.test.dan.com - - [20/May/2004:11:48:41 -0400] "GET /srs71/doc/srsbook_srsusr/images/i_
icon.jpg HTTP/1.1" 200 514
cbdw88209.utp.test.dan.com - - [20/May/2004:11:48:41 -0400] "GET /srs71/doc/srsbook_srsusr/images/li
st_values_up.jpg HTTP/1.1" 200 1991

PaulTEG · May 25, 2004

Code:

 my %hash_of_keys;
open FH, "<access_log";
while (<FH>) {
  ($ip_or_host, $the_rest)=split(/- -/, $_); #might need escapin'
  ($access_time,$crap)=split (/ "GET/, $the_rest);
  if ($hash_of_keys{$ip_or_host} == undef) { #might need to be eq
    $hash_of_keys{$ip_or_host} = $access_time;
  }
}

then iterate over hash to print out the keys and values

HTH
--Paul

Not tested, and its late ...

rharsh · May 26, 2004

I don't know if Paul's code will work exactly as planned. Each entry from the example log file spans multiple (2) lines. It appears that after every 100 characters a new line character is inserted in the log. The second line of each entry is, as far as this project is concerned, garbage. I didn't test Paul's code, but I believe it will try to process the garbage just as it processes the address/access time info (I could be out in left field though, if I am, just ignore me.

)

Not that it would be hard to fix, but since the easiest way to test whether a line is valid is with a regex, why not let it do all the work?

Code:

my %results;
while (<DATA>) {
  if (/([\w.]+) - - (\[.+?\]).*/) {
    unless ($results{$1}) { $results{$1} = $2; }
  }
}

Then, just as Paul suggested, iterate through the hash to get the address/access time info.

One other thing to consider would be replacing the non-greedy operator in the regex if you wanted to make this run a bit faster.

PaulTEG · May 26, 2004

AFAIK,

Log files should only occupy one line per record, at least any I've seen. For big files, I'd avoid the regex, though its most likely implied in the split, because of the overhead of firing up the regex engine for each line in a large log file

just my 0.02c
--Paul

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Perl and Apache Log search?

robertsdgm

IS-IT--Management

PaulTEG

Technical User

rharsh

Technical User

PaulTEG

Technical User

Similar threads

Part and Inventory Search

Sponsor