INTELLIGENT WORK FORUMS FOR COMPUTER PROFESSIONALS
Come Join Us!
Are you a Computer / IT professional? Join Tek-Tips now!
- Talk With Other Members
- Be Notified Of Responses
To Your Posts
- Keyword Search
- One-Click Access To Your
Favorite Forums
- Automated Signatures
On Your Posts
- Best Of All, It's Free!
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.
Partner With Us!
"Best Of Breed" Forums Add Stickiness To Your Site

(Download This Button Today!)
Feedback
"...This site is truly a marvel. Without a doubt the most comprehensive, friendly and just plain useful resource of its kind..."
Geography
Where in the world do Tek-Tips members come from?
|
Sub-totalling an array
|
|
CODE --> gawk ' BEGIN { sitea=0 ; siteb=0 }
{
if ($1 ~ "192.168.6") { sitea++ } else { siteb++ }
x[sitea","siteb","$7","$9]
}
END {
for (i in x)
print i
}' $FILE
No doubt and probably needless to say, the above code does not give me the required result.
It is reading an apache access log and I am seeking TOTAL http hits for "sitea" and TOTAL http hits for "siteb"
Example report:
SITEA,SITEB,URL,HTTP Response Code,
29,53,/apps/tube/icon_Tube.png,200
60,102,/apps/mix/icon_mix.png,200
389,536,/publish/images/vpl.png,404
etc....
As always thanks in advance,
Madasafish
|
|
|
feherke (Programmer) |
16 Jun 12 12:24 |
|
Hi Feherke,
I am using gawk V4. I believe it was you who introduced me to "patsplit" on another thread.
Madasafish |
|
gawk --version
GNU Awk 4.0.0
Copyright (C) 1989, 1991-2011 Free Software Foundation.
|
|
|
feherke (Programmer) |
17 Jun 12 6:14 |
Hi
Quote (Madasafish)I am using gawk V4.
Great. Then the real multidimensional array will do the job. I would prefer this way :
CODEawk -vOFS=, '{x[$1][$7][$9]++}END{for(h in x)for(u in x[h])for(s in x[h][u])print x[h][u][s],h,u,s}' /var/log/httpd/access_log
Which produces count,host,path,status output like this :
CODE1,192.168.0.1,/mustache/syntax.htm,200
1,192.168.6.1,/mustache/syntax.htm,200
1,192.168.0.1,/mustache/style.css,200
5,192.168.6.1,/mustache/style.css,304
7,192.168.6.1,/mustache/style.css,200
To exactly reproduce your sitea,siteb,path,status output format :
CODEawk -vOFS=, '{x[$7][$9][$1~/192\.168\.2/?"a":"b"]++}END{for(u in x)for(s in x[u])print x[u][s]["a"]+0,x[u][s]["b"]+0,u,s}' /var/log/httpd/access_log
Which produces this output from the same input data :
CODE1,1,/mustache/syntax.htm,200
5,0,/mustache/style.css,304
7,1,/mustache/style.css,200
Feherke.
http://feherke.github.com/ |
|
Absolutely brilliant!
Thank-you very much Feherke. A payment to the club is long overdue
As always, now the hard work has been done I think I can embellish it, only to find I get stuck again.
As it's a 3852 line CSV file it lends itself to Spreadsheet filters. I am trying to add a couple of filters and cannot understand why it will not work. In your print statement you print "u". If I try to split "u" or try to reference a string in "u" it stops working and do not understand why?.
Here is the working code only if my "embelishments" are commented out.
CODE --> gawk -vOFS=, '{
x[$7][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(u in x)
for(s in x[u])
#split(u,z,"/")
#if (z[2] ~ /debug/) ft1="Debug"
#if (z[3] ~ /vod/) ft1="Vod"
#if (z[4] ~ /flashapp.xml/) ft1="FlashApp"
#if (u ~ /SSI|Sky|sky/) ft2="Sky"
#if (u ~ /bbc/) ft2="BBC"
print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2
}' $FILE
As always, Thanks in advance
Madasafish
|
|
feherke (Programmer) |
17 Jun 12 8:57 |
Hi
Quote (Madasafish)In your print statement you print "u". If I try to split "u" or try to reference a string in "u" it stops working and do not understand why?
Just as you wrote, I print u.
But you are split()ing, doing 5 conditional assignments, then printing. The for statement will execute only the very next one instruction. To make the for execute all those, enclose them in braces ( {} ). Feherke.
http://feherke.github.com/ |
|
What about this ?
CODEgawk -vOFS=, '{
x[$7][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(u in x) {
split(u,z,"/")
if (z[2] ~ /debug/) ft1="Debug"
if (z[3] ~ /vod/) ft1="Vod"
if (z[4] ~ /flashapp.xml/) ft1="FlashApp"
if (u ~ /SSI|Sky|sky/) ft2="Sky"
if (u ~ /bbc/) ft2="BBC"
for(s in x[u])
print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2
}
}' $FILE Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer? |
|
Another embelishment
Sorry
CODE --> gawk -v OFS="," -v hoururl=$HOURURL -v hourdir=$HOURDIR '
{
split($4,b,/:/)
hour=b[2]
min=b[3]
url=$7
gsub(/%20/,"_",url)
split(url,f,"/")
appname=f[3]
if (f[4] ~ /flashapp.xml/) {
url="\"=HYPERLINK(\"\""hoururl"/"appname".csv\"\",\"\""$7"\"\")\""
t[appname][hour][$1~/10\.185\.116/?"c":"d"]++
}
x[url][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(m in t) {
for(n in t[m]) {
n=sprintf("%02d",n) #<---Does not work
print n,t[m][n]["d"]+0,t[m][n]["c"]+0 > hourdir"/"m".csv"
}
}
for(u in x) {
for(s in x[u]) {
print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2
}
}
}' ${INFILE} | sort -n -r >> ${OUTFILE}
Ferherke's code is perfect. I have removed the filters mentioned earlier for clarity.
As you can see I introduced another loop which provides hourly hits in seperate files "m". The hours "n" run from 00 to 23.
I have three gotcha's.
For the hours 00 to 09 it prints 0 to 9. (single digits). I would ideally like double digits 00 to 09.
The file/s it creates are not sorted for the hours 00 through to 23. Can this be done within the gawk prog?
I need a header of Hour,Site A,Site B for each file created.
As always, thanks in advance
Madasafish |
|
CODE --> for(m in t) {
for(n in t[m]) {
print "=\""n"\"",t[m][n]["d"]+0,t[m][n]["c"]+0 > hourdir"/"m
}
For the benefit of other readers,
Quote:For the hours 00 to 09 it prints 0 to 9. (single digits). I would ideally like double digits 00 to 09.
It was Excell that was truncating the leading zero. I managed to fix this using the above syntax for "n".
Quote:The file/s it creates are not sorted for the hours 00 through to 23. Can this be done within the gawk prog?
Unfortunately this is way beyond my remit and resolved this with an external bash "for loop" at the end which uses the excellent sort command. I would welcome a gawk solution
Quote:I need a header of Hour,Site A,Site B for each file created.
Easily accommodated with the external bash "for loop". Again, would be very interested in seeing a gawk solution with the above code.
Cheers,
Madasafish
|
|
1) "hour" is extracted from a string. So it is a string. To consider it as a number (in order to format it), I would have use a command such as int()
[edit] reading again your code, I mentionned you have 2 n. One for the loop and one to store the result of sprintf. This is not good.[edit]
CODE --> awkprintf("%02d",int(n))
2) to sort in awk like outside awk:
CODE --> awksystem("MyPersonalExternalCommand") |
|
|
 |
|