Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing Log Files 1

Status
Not open for further replies.

williamu

Programmer
Apr 8, 2002
494
GB
Hi All,

I'm sure this has been done before but I can't find many references to it. I'd like a script that scans the access log from apache and then creates a file containing a list of Unique IP Addresses found plus their first and last access dates and times and line no. Separated using a : or somthing like that so I can then split it later using Perl.

192.168.5.54:30/06/2003:12-00-21:30/06/2003:12-01-41:1
...

I'd like to do this myself but I wouldn't know where to start as regards the expressions and syntax. Could someone please provide me with a few pointers.

Thanks.


William
Software Engineer
ICQ No. 56047340
 
> 192.168.5.54:30/06/2003:12-00-21:30/06/2003:12-01-41:1
Ok, I guess this is what you want

For those who don't know, what's the format of an apache log file....
 
Hi,

This is a (single) line from the log file as it stands. I can't find a man entry for it but I hope this is enough to be going on with.

192.168.5.32 - - [01/Jul/2003:13:25:00 +0100] "POST /cgi-bin/editdata.cgi HTTP/1.1" 200 934 " "-"

Thanks.

William
Software Engineer
ICQ No. 56047340
 
Give this a shot
Code:
#!/bin/awk -f

# get
# 192.168.5.54:30/06/2003:12-00-21:30/06/2003:12-01-41:1
#
# from a series of
# 192.168.5.32 - - [01/Jul/2003:13:25:00 +0100] "POST /cgi-bin/editdata.cgi HTTP/1.1" 200 934 "[URL unfurl="true"]http://192.168.0.32/cgi-bin/admin.cgi";[/URL] "-"

# $1 is IP address
# $4 is the date/time
BEGIN {
  count = 0
}

{
  if ( !($1 in seen_ip) ) {
    seen_ip[$1] = $1
    first_line[$1] = FNR
    first_time[$1] = format_date_time(substr($4,2))
    last_time[$1] = first_time[$1]    # in case this is the only one
    num_ips[count++] = $1
  } else {
    last_time[$1] = format_date_time(substr($4,2))
  }
}

END {
  OFS=":"
  for ( i = 0 ; i < count ; i++ ) {
    ip = num_ips[i]
    print ip,first_time[ip],last_time[ip],first_line[ip]
  }
}

function format_date_time ( dt ) {
  gsub(&quot;:&quot;,&quot;-&quot;,dt)  # change all : into -
  sub(&quot;-&quot;,&quot;:&quot;,dt)   # turn the first one back into :
  return dt
}

Invoke as
Code:
awk -f prog.awk logfile
 
Hi Salem,

I wasn't expecting you to write this for me but since you have I tried it and it works. Thanks very much and have a star for all your effort.

It's really appreciated.

Thanks.


William
Software Engineer
ICQ No. 56047340
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top