Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...This site is truly a marvel. Without a doubt the most comprehensive, friendly and just plain useful resource of its kind..."

Geography

Where in the world do Tek-Tips members come from?

Sub-totalling an arrayHelpful Member! 

madasafish (TechnicalUser)
16 Jun 12 11:50

CODE -->

gawk ' BEGIN { sitea=0 ; siteb=0 }
{
if ($1 ~ "192.168.6") { sitea++ } else { siteb++ }

x[sitea","siteb","$7","$9]
}

END {
for (i in x)
print i

}' $FILE


No doubt and probably needless to say, the above code does not give me the required result.
It is reading an apache access log and I am seeking TOTAL http hits for "sitea" and TOTAL http hits for "siteb"

Example report:
SITEA,SITEB,URL,HTTP Response Code,
29,53,/apps/tube/icon_Tube.png,200
60,102,/apps/mix/icon_mix.png,200
389,536,/publish/images/vpl.png,404
etc....

As always thanks in advance,

Madasafish
feherke (Programmer)
16 Jun 12 12:24
Hi

I see you use gawk. Which version ?

Feherke.
http://feherke.github.com/

madasafish (TechnicalUser)
16 Jun 12 15:52
Hi Feherke,

I am using gawk V4. I believe it was you who introduced me to "patsplit" on another thread.

Madasafish
madasafish (TechnicalUser)
16 Jun 12 16:08
gawk --version
GNU Awk 4.0.0
Copyright (C) 1989, 1991-2011 Free Software Foundation.
feherke (Programmer)
17 Jun 12 6:14
Hi

Quote (Madasafish)

I am using gawk V4.
Great. Then the real multidimensional array will do the job. I would prefer this way :

CODE

awk -vOFS=, '{x[$1][$7][$9]++}END{for(h in x)for(u in x[h])for(s in x[h][u])print x[h][u][s],h,u,s}' /var/log/httpd/access_log
Which produces count,host,path,status output like this :

CODE

1,192.168.0.1,/mustache/syntax.htm,200
1,192.168.6.1,/mustache/syntax.htm,200
1,192.168.0.1,/mustache/style.css,200
5,192.168.6.1,/mustache/style.css,304
7,192.168.6.1,/mustache/style.css,200

To exactly reproduce your sitea,siteb,path,status output format :

CODE

awk -vOFS=, '{x[$7][$9][$1~/192\.168\.2/?"a":"b"]++}END{for(u in x)for(s in x[u])print x[u][s]["a"]+0,x[u][s]["b"]+0,u,s}' /var/log/httpd/access_log
Which produces this output from the same input data :

CODE

1,1,/mustache/syntax.htm,200
5,0,/mustache/style.css,304
7,1,/mustache/style.css,200

Feherke.
http://feherke.github.com/

madasafish (TechnicalUser)
17 Jun 12 8:40
Absolutely brilliant!

Thank-you very much Feherke. A payment to the club is long overdue thumbsup2

As always, now the hard work has been done I think I can embellish it, only to find I get stuck again.

As it's a 3852 line CSV file it lends itself to Spreadsheet filters. I am trying to add a couple of filters and cannot understand why it will not work. In your print statement you print "u". If I try to split "u" or try to reference a string in "u" it stops working and do not understand why?.

Here is the working code only if my "embelishments" are commented out.

CODE -->

gawk -vOFS=, '{
x[$7][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(u in x)
for(s in x[u])

#split(u,z,"/")
#if (z[2] ~ /debug/) ft1="Debug"
#if (z[3] ~ /vod/) ft1="Vod"
#if (z[4] ~ /flashapp.xml/) ft1="FlashApp"
#if (u ~ /SSI|Sky|sky/) ft2="Sky"
#if (u ~ /bbc/) ft2="BBC"

print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2

}' $FILE

As always, Thanks in advance
Madasafish




Helpful Member!  feherke (Programmer)
17 Jun 12 8:57
Hi

Quote (Madasafish)

In your print statement you print "u". If I try to split "u" or try to reference a string in "u" it stops working and do not understand why?
Just as you wrote, I print u.

But you are split()ing, doing 5 conditional assignments, then printing. The for statement will execute only the very next one instruction. To make the for execute all those, enclose them in braces ( {} ).

Feherke.
http://feherke.github.com/

PHV (MIS)
17 Jun 12 9:04
What about this ?

CODE

gawk -vOFS=, '{
x[$7][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(u in x) {
split(u,z,"/")
if (z[2] ~ /debug/) ft1="Debug"
if (z[3] ~ /vod/) ft1="Vod"
if (z[4] ~ /flashapp.xml/) ft1="FlashApp"
if (u ~ /SSI|Sky|sky/) ft2="Sky"
if (u ~ /bbc/) ft2="BBC"
for(s in x[u])
print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2
}
}' $FILE

Hope This Helps, PH.
FAQ219-2884: How Do I Get Great Answers To my Tek-Tips Questions?
FAQ181-2886: How can I maximize my chances of getting an answer?

madasafish (TechnicalUser)
23 Jun 12 14:50
Another embelishment
Sorry sad

CODE -->

gawk -v OFS="," -v hoururl=$HOURURL -v hourdir=$HOURDIR '
{
split($4,b,/:/)
hour=b[2]
min=b[3]
url=$7
gsub(/%20/,"_",url)
split(url,f,"/")
appname=f[3]


if (f[4] ~ /flashapp.xml/) {
url="\"=HYPERLINK(\"\""hoururl"/"appname".csv\"\",\"\""$7"\"\")\""
t[appname][hour][$1~/10\.185\.116/?"c":"d"]++
}

x[url][$9][$1~/192\.168\.2/?"a":"b"]++
}
END {
for(m in t) {
for(n in t[m]) {
n=sprintf("%02d",n) #<---Does not work
print n,t[m][n]["d"]+0,t[m][n]["c"]+0 > hourdir"/"m".csv"
}
}

for(u in x) {
for(s in x[u]) {
print x[u][s]["b"]+x[u][s]["a"]+0,x[u][s]["b"]+0,x[u][s]["a"]+0,u,s,ft1,ft2
}
}

}' ${INFILE} | sort -n -r >> ${OUTFILE}

Ferherke's code is perfect. I have removed the filters mentioned earlier for clarity.
As you can see I introduced another loop which provides hourly hits in seperate files "m". The hours "n" run from 00 to 23.

I have three gotcha's.
For the hours 00 to 09 it prints 0 to 9. (single digits). I would ideally like double digits 00 to 09.
The file/s it creates are not sorted for the hours 00 through to 23. Can this be done within the gawk prog?
I need a header of Hour,Site A,Site B for each file created.

As always, thanks in advance

Madasafish
madasafish (TechnicalUser)
24 Jun 12 4:26

CODE -->

for(m in t) {
for(n in t[m]) {
print "=\""n"\"",t[m][n]["d"]+0,t[m][n]["c"]+0 > hourdir"/"m
}

For the benefit of other readers,

Quote:

For the hours 00 to 09 it prints 0 to 9. (single digits). I would ideally like double digits 00 to 09.

It was Excell that was truncating the leading zero. I managed to fix this using the above syntax for "n".

Quote:

The file/s it creates are not sorted for the hours 00 through to 23. Can this be done within the gawk prog?
Unfortunately this is way beyond my remit and resolved this with an external bash "for loop" at the end which uses the excellent sort command. I would welcome a gawk solution

Quote:

I need a header of Hour,Site A,Site B for each file created.

Easily accommodated with the external bash "for loop". Again, would be very interested in seeing a gawk solution with the above code.

Cheers,

Madasafish



FlorianAwk (Programmer)
25 Jun 12 13:50
1) "hour" is extracted from a string. So it is a string. To consider it as a number (in order to format it), I would have use a command such as int()

[edit] reading again your code, I mentionned you have 2 n. One for the loop and one to store the result of sprintf. This is not good.[edit]

CODE --> awk

printf("%02d",int(n))

2) to sort in awk like outside awk:

CODE --> awk

system("MyPersonalExternalCommand")

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Back To Forum

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close