Count Unique Records Per Hour 1

viadisky · Jun 5, 2006

Hi,

I have this raw data:
212.227.90.54 - - [31/May/2006:01:00:28 +0100]
212.227.90.54 - - [31/May/2006:01:00:32 +0100]
172.24.3.1 - - [31/May/2006:01:00:09 +0100]
212.227.90.54 - - [31/May/2006:02:00:12 +0100]
212.227.90.54 - - [01/Jun/2006:01:00:45 +0100]
172.24.3.1 - - [01/Jun/2006:02:00:05 +0100]

Basically, I want to count the unique IP Address showing evey hour on a particular day. Desired output data is displayed below.

Date Hour Unique_IPAdd
31/May/2006 01 2
31/May/2006 02 1
01/Jun/2006 01 1
01/Jun/2006 02 1

Any help will be appreciated

)

Cheers!

feherke · Jun 5, 2006

Hi

An [tt]awk[/tt] script ?

Code:

BEGIN {
  FS="[ [:]+"
}
{
  if (l!=$4 " " $5) {
    n=0
    for (i in u) n++
    print l,n
    delete u
  }
  l=$4 " " $5
  u[$1]++
}

Feherke.

http://rootshell.be/~feherke/

viadisky · Jun 5, 2006

Thanks Feherke!

) I will try this but is it possible for you to provide a one-liner awk approach for this? Sorry if I ask too much ...

feherke · Jun 5, 2006

Hi

LOL. This was its first format, just I modified it to be easier to read.

Code:

awk -F '[ [:]+' '{if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete u}l=$4" "$5;u[$1]++}' /input/file

Abit long line, but is one.

Feherke.

http://rootshell.be/~feherke/

viadisky · Jun 5, 2006

I tried to run the command you provided in one line but I have some syntax error ...

Please refer to the output below:

Code:

zen:/u/mviado$ more file
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
82.29.155.235 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
82.29.155.235 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
195.93.21.40 [31/May/2006:01:00:00 +0100]
82.29.155.235 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
zen:/u/mviado$  awk -F '[ [:]+' '{if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete u}l=$4" "$5;u[$1]++}' file
awk: syntax error near line 1
awk: bailing out near line 1

feherke · Jun 5, 2006

Hi

Uff. The code works with [tt]gawk[/tt]. The regular expression in FS and the [tt]delete[/tt] command are extensions. If you have standard [tt]awk[/tt], will not work.

Feherke.

http://rootshell.be/~feherke/

viadisky · Jun 5, 2006

Hi Feherke,

Thanks for providing additional information, I do appreciate all the help, I'm struglling with unix scripting as of the moment!

Too bad, I only have awk ...! Thanks for your help anyway ...

Cheers!

Annihilannic · Jun 5, 2006

Try nawk perhaps, if you are under Solaris? What OS is it anyway?

Also, your test data differs from the data in your original post?

Annihilannic.

viadisky · Jun 5, 2006

Hi Annihilannic,

Thanks for pointing out the differences between the input files. I revised the file and run the command using "nawk".

This is the output I got ... thanks in advance!

)

Code:

zen:/u/mviado$ more file
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
195.93.21.40 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
zen:/u/mviado$ nawk -F '[ [:]+' '{if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete u}l=$4" "$5;u[$1]++}' file
nawk: you can only delete array[element] at source line 1
 context is
        {if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete >>>  u} <<< 
nawk: syntax error at source line 1
nawk: illegal statement at source line 1

viadisky · Jun 5, 2006

Sorry I forgot to include our server basic info ...

SunOS zen 5.8 Generic_117350-02 sun4u sparc SUNW,Ultra-80

feherke · Jun 5, 2006

Hi

I think this will work :

Code:

tr '[:' '  ' < /input/file | awk '{if(l!=$4" "$5){n=0;for(i in u){n++;delete u[i]}print l,n}l=$4" "$5;u[$1]++}'

Feherke.

http://rootshell.be/~feherke/

viadisky · Jun 5, 2006

Hi Feherke,

This is the output I got ...

Code:

zen:/u/mviado$ tr '[:' '  ' < file | awk '{if(l!=$4" "$5){n=0;for(i in u){n++;delete u[i]}print l,n}l=$4" "$5;u[$1]++}' 
awk: u is not an array
 record number 1
zen:/u/mviado$

Annihilannic · Jun 5, 2006

I modified feherke's code and tested on Solaris:

Code:

/usr/xpg4/bin/awk -F '[[ :]+' 'function p() { n=0; for (i in u) n++; print l,n; delete u} NR > 1 && l!=$4 " " $5 { p() } { l=$4
" " $5 ; u[$1]++ } END { p() } ' filename

Note that I changed the printout to a function so it works for the last set of data too (executed from END clause) and also made it not print anything for the first record.

Annihilannic.

viadisky · Jun 5, 2006

Hi Annihilannic,

The script is producing continuous "31/May/2006 1" ... I just basically copy the command you provided and just replace the correct filename --> file

Code:

zen:/u/mviado$  { p() } { l=$4^J" " $5 ; u[$1]++ } END { p() } ' file                                    <
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1

I noticed that the output lines produced is same as the number of file input lines. Did I miss anything?

Many thanks ...

Annihilannic · Jun 5, 2006

It's all supposed to be on one line... unfortunately the tek-tips web site couldn't cope with the width of the line. Take out the line feed after l=$4 and it should work fine.

Annihilannic.

feherke · Jun 5, 2006

Hi

Thanks to pointing out the last line problem, Annihilannic. I had a newline at the end and forget about it.

Feherke.

http://rootshell.be/~feherke/

viadisky · Jun 5, 2006

I removed that line feed "^J" in the command, sorry I wasn't able to notice this error. To be honest the commands you're suggesting are way too advanced for me... I just mainly rely on the commands provided (copy, paste and run)

)

The output I got is only "1" ...

Code:

zen:/u/mviado$ /usr/xpg4/bin/awk -F '[[ :]+' 'function p() { n=0; for (i in u) n++; print l,n; delete u} NR > 1 && l!=$>
       1
zen:/u/mviado$

I'm missing the count of unique IP Addresses, as you can see from my previous file example, I have this IP Address "212.227.90.54" repeating several times during:
"[31/May/2006:01:00:00 +0100]"

So I'm kinda hoping to have different line count for "82.29.155.235" on that day ...

Thanks!

Annihilannic · Jun 5, 2006

When I run it I get output like this (I added a few more test cases to your sample data):

[tt]31/May/2006 01 3
31/May/2006 02 7[/tt]

Why not put it in a script instead of just pasting on to the command line?

Another method for you to try, again, all on one line:

Code:

nawk -F'[[ :]+' '{print $1,$4,$5}' filename | sort -k 2,2 -k 3,3 -k 1,1 | uniq | cut -d' ' -f 2- | uniq -c

Note that the output is in a different order, the count is first.

Annihilannic.

viadisky · Jun 5, 2006

Hi Annihilannic/Feherke,

At last it worked!!! THANK YOU!!!

)

If I put them in a script, both of them are working fine ...

But running them in command line, the "/usr/xpg4/bin/awk -F '[[ :]+' 'function ..." approach is not producing the expected output ...

Thank you guys for providing extra help/effort for a beginner like me!

Cheers!

)

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Count Unique Records Per Hour 1

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

MIS

Technical User

Technical User

Programmer

Technical User

MIS

Technical User

MIS

Programmer

Technical User

MIS

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor