Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

grep / sed question

Status
Not open for further replies.

adev111

Programmer
Jul 8, 2004
44
0
0
CA
Hi,
I change my text files here manually which have the data in the following format.

1085961616.474 172 190.104.253.84 TCP_MISS/200 2146 GET [07fiG10lKmU4M3R:IQ.TCH] - DIRECT/63.236.66.14 text/html

1085961622.602 60 68.22.217.209 TCP_HIT/200 9476 GET - NONE/- image/gif

1085961627.502 159 190.104.253.84 TCP_REFRESH_HIT/304 339 GET - DIRECT/192.43.217.199 application/x-javascript


1085961648.792 6 12.168.85.240 TCP_MISS/503 1585 GET - NONE/- text/html

I change these files to have the url's only i.e. as below


Now I filter out my urls according these ways
1) I look for everything else other than _hit/200 and _miss/200 and remove it from the file
2) Then i remove all the lines that contain a question "?" in the result from above
3) Then from the result in 2 above, and copy the url only.

Currently I only have 10 records in the text file, however down the line I'll be receiving these files in megs. As a result I am trying to make my life easier before hand by creating a shell script.

Now here is my algorithm. get the line number that do not have the expression _hit/200 andn _miss/200, pick it and delete it. then for the ones left truncate everything before http: and after .js or .jsp or .html or .gif it will leave me with only the urls.

I am not so good at script syntax but here's what I've come up with so far.

Code:
grep "(_hit|MISS)/200)*.!\?*"log.txt -- 
[/code/
Would someone tell me what am I doing wrong?

Thanks in advance
 
Here's what I came up with so far.

awk '($4~/_(miss|HIT)\/200/){print $7}' NLANR-Logs.txt | grep -v "?"


Any insights?

 
The grep -v part won't work because "?" has special meaning for grep. You can do it all in the awk part anyway, e.g.:

[tt]awk '($4~/_(MISS|HIT)\/200/) && !/\?/ {print $7}' NLANR-Logs.txt [/tt]

Annihilannic.
 
Code:
#!/usr/bin/perl
use strict;
use warnings;

my %urls;

while (<>) {
   next unless /TCP_(HIT|MISS)\/200/;   
   my @parts = split(/\s+/, $_);
   my ($url, @junk) = split(/\?/, @parts[6]);
   $urls{$url}++;
}

print "Count\t\tURL\n\n";

foreach (sort keys %url) {
   print "$url{$_}\t\t$_\n";
}
Not tested (no linux box here), but should summarise counts per URL. To invoke
Code:
myscript [i]log.txt[/i]


Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top