Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract some information from a field 2

Status
Not open for further replies.

ddupont2

Programmer
Oct 4, 2002
2
FR
Hello,

I want to rewrite some logs from Apache with awk. For the moment I may filter only the lines I want but now in the selected lines, I want to rewrite only some informations from one of the fields.
Do you have an idea or hints to resolve my problem?

Thanks,
Denis


Example of my data:
192.1.201.100 - - [09/Jan/2004:12:01:05 +0100] "GET /directory1/Pagename1;jsessionid=00JSN0M5XEY+uo?productId=41104&E_mp=2101001000002-12&strAddress=&strLocation=montmirail&strCP=&strCountry=EUR HTTP/1.1" 200 8856
192.1.201.100 - - [09/Jan/2004:12:00:41 +0100] "GET /directory2/pagename2;jsessionid=0000GH320M5XEY+uo HTTP/1.1" 200 15616
192.1.201.100 - - [09/Jan/2004:12:01:10 +0100] "GET /directory2/pagename2;jsessionid=0000GJSN0M5XEY+upouo?from=0&rnd=10736400&E_mp=2105040103fra541104130x1103EUR000ebW9udG1p0002-12&stat=ambiguous_tourism&strChoice=0 HTTP/1.1" 200 34638
192.1.201.100 - - [09/Jan/2004:11:59:20 +0100] "GET /directory1/pagename2;jsessionid=0000LILFAAGRJGG1SWYNGKQUYZI+upoplruo?&E_mp=210504cGFyaXM000edHJhaW4gYmxldQ12091172.352671212&strLocation=Paris&stat=ambiguous_tourism&strCountry=1424&strAddress=&strCP=75000&productId=41102 HTTP/1.1" 200 16066

For this example, I want to write field 1 to 6,
for the field 7, only the part before ";jsessionid=..." and for the parameters (and values) after "?" only "&stat=value" and "&productId=value"
and all the fields after the field 7 (it's possible that the number af fields change between the differents lines)

The result for my example should be :
192.1.201.100 - - [09/Jan/2004:12:01:05 +0100] "GET /directory1/Pagename1?productId=41104 HTTP/1.1" 200 8856
192.1.201.100 - - [09/Jan/2004:12:00:41 +0100] "GET /directory2/pagename2 HTTP/1.1" 200 15616
192.1.201.100 - - [09/Jan/2004:12:01:10 +0100] "GET /directory2/pagename2?stat=ambiguous_tourism HTTP/1.1" 200 34638
192.1.201.100 - - [09/Jan/2004:11:59:20 +0100] "GET /directory1/pagename2?stat=ambiguous_tourism&productId=41102 HTTP/1.1" 200 16066
 
Try something like this:
Code:
awk '{
if(split($0,a,"\"")!=3){print;next}
tmp=a[2];sub(/;jsessionid[^ ?]*/,";",tmp)
get="\""tmp;sub(/;.*/,"",get)
http=tmp"\"";sub(/.* /," ",http)
stat=tmp;if(s=sub(/.*stat=/,"stat=",stat)){
  sub(/[ &].*/,"",stat);stat="?"stat} else stat=""
prid=tmp;if(sub(/.*productId=/,"productId=",prid)){
  sub(/[ &].*/,"",prid);prid=(s?"&":"?")prid} else prid=""
print a[1] get stat prid http a[3]
}' /path/to/oldlog >/path/to/newlog

Hope This Help
PH.
 
How about...

awk '{
[tt]
match($0, ";jsessionid=[^ ]*");
printf substr($0,1,RSTART-1);
n=split(substr($0,RSTART,RLENGTH),a,"[&?]");
for (x=1;x<=n;x++)
if ( a[x] ~ &quot;^(stat)|(productId)=&quot; )
printf &quot;?&quot; a[x];
print substr($0,RSTART+RLENGTH);
[/tt]
}' /path/to/oldlog >/path/to/newlog
 
Ygor, your script is sure simplier, but don't give the expected result for the 4rd input line, ie when they are stat and productId
 
PHV, I've just tested both scripts and they both give identical and correct results. I do like your script though, just looking at it makes me feel unusual.
 
At first, thank you for your responses !

Ygor, I think PHV is True; for th 4th line, your script give
192.1.201.100 - - [09/Jan/2004:11:59:20 +0100] &quot;GET /directory1/pagename2?stat=ambiguous_tourism?productId=41102 HTTP/1.1&quot; 200 16066

instead of
192.1.201.100 - - [09/Jan/2004:11:59:20 +0100] &quot;GET /directory1/pagename2?stat=ambiguous_tourism&productId=41102 HTTP/1.1&quot; 200 16066

I will test your scripts on real data next week and if i make changes, i will inform you

Thanks,
Denis
 
I see the problem now. It it easily fixed..
[tt]
match($0, &quot;;jsessionid=[^ ]*&quot;);
printf substr($0,1,RSTART-1);
n=split(substr($0,RSTART,RLENGTH),a,&quot;[&?]&quot;);
s=&quot;?&quot;;
for (x=1;x<=n;x++)
if ( a[x] ~ &quot;^(stat)|(productId)=&quot; ) {
printf s a[x];
s=&quot;&&quot;;
}
print substr($0,RSTART+RLENGTH);
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top