getting info from web stats

starlite79 · Feb 4, 2009

Hi there.
I would like to have awk read a file with web stats and print out those lines that had 100 or more hits in a given month.
Here is my code:

Code:

#!/usr/bin/awk

# This awk program reads web statistics with five columns and returns
# information on which html and gif files received 100 or more hits
# in a particular month.

BEGIN { 
   FS = "[ \t]+" # make any number of tabs,
         # and spaces the field separator
}
      sub(/\|/, "")
      $4 >= 100 { print $0
      }

The problem is, I think, that the field separator is not defined properly. When I look at the file, it has 5 columns, but they are separated by different amounts of spaces. Right now the only thing that is working correctly is that the "pipe" metacharacter is being replaced by a blank.

At the command line, I type awk -f program oldfile >> newfile

Attached is a piece of the old file I am trying to work with. The columns are %Requests, %Bytes, Bytes Sent, Requests, Archive Section. Can someone offer some assistance?
0.13 0.02 853220 361 | /data.html
0.12 0.02 675071 340 | /info.html
0.02 0.01 460273 45 | /xy.gif
0.01 0.01 268199 20 | /t.gif

Annihilannic · Feb 4, 2009

Welcome back starlite79.

You shouldn't actually have to define the separator, because the default one will serve your purpose fine.

Your $4 >= 100 is never actually resulting in true (for reasons I think we discussed in our previous thread about redefining FS); the reason the line is being printed is because awk is treating your sub() as a logical expression which evaluates to true when it performs a successful substition, and the default action for a true expression is to print the line. To process every line, you should in fact enclose your code in { } as well, with no expression in front of it.

Also, you may as well only do the sub() when you find a line you are interested in.

In short, this should do the trick:

Code:

awk '$4 >= 100 { sub(/\|/,""); print }' oldfile > newfile

Annihilannic.

starlite79 · Feb 6, 2009

Thanks!
I came up with the following workaround to the | metacharacter, but your one liner is cleaner.

Code:

#!/usr/bin/awk

# This awk program reads web statistics with six columns and returns
# information on which html and gif files received 100 or more hits
# in a particular month.

{
      if ($4 >= 100) {
          print "Number of hits is", $4, "for", $6
          }
}

I have a follow-up question. I ran the code on four months of web stats and concatenated and sorted the output by *.html or *gif. I then manually found the average hits for each unique filename. Could awk have read this file and told me the average hits per unique file name? Not every filename was in all 4 months, so a count of some sort would have been needed.

Here is an example of the sorted concatenated file:

data.html 322
data.html 380
data.html 402
data.html 450
picture.gif 105
picture.gif 300

Annihilannic · Feb 8, 2009

Yes, I'm pretty sure you could have done all of that in awk.

Annihilannic.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

getting info from web stats

starlite79

Technical User

Annihilannic

MIS

starlite79

Technical User

Annihilannic

MIS

Similar threads

Part and Inventory Search

Sponsor