Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Count Unique Records Per Hour 1

Status
Not open for further replies.

viadisky

Technical User
Jun 19, 2003
110
GB
Hi,

I have this raw data:
212.227.90.54 - - [31/May/2006:01:00:28 +0100]
212.227.90.54 - - [31/May/2006:01:00:32 +0100]
172.24.3.1 - - [31/May/2006:01:00:09 +0100]
212.227.90.54 - - [31/May/2006:02:00:12 +0100]
212.227.90.54 - - [01/Jun/2006:01:00:45 +0100]
172.24.3.1 - - [01/Jun/2006:02:00:05 +0100]

Basically, I want to count the unique IP Address showing evey hour on a particular day. Desired output data is displayed below.

Date Hour Unique_IPAdd
31/May/2006 01 2
31/May/2006 02 1
01/Jun/2006 01 1
01/Jun/2006 02 1

Any help will be appreciated :eek:)

Cheers!
 
Thanks Feherke! :eek:) I will try this but is it possible for you to provide a one-liner awk approach for this? Sorry if I ask too much ...

 
Hi

LOL. This was its first format, just I modified it to be easier to read.
Code:
awk -F '[ [:]+' '{if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete u}l=$4" "$5;u[$1]++}' /input/file
Abit long line, but is one.

Feherke.
 
I tried to run the command you provided in one line but I have some syntax error ...

Please refer to the output below:

Code:
zen:/u/mviado$ more file
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
82.29.155.235 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
82.29.155.235 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
195.93.21.40 [31/May/2006:01:00:00 +0100]
82.29.155.235 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
212.227.90.54 [31/May/2006:01:00:00 +0100]
zen:/u/mviado$  awk -F '[ [:]+' '{if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete u}l=$4" "$5;u[$1]++}' file
awk: syntax error near line 1
awk: bailing out near line 1
 
Hi

Uff. The code works with [tt]gawk[/tt]. The regular expression in FS and the [tt]delete[/tt] command are extensions. If you have standard [tt]awk[/tt], will not work.

Feherke.
 
Hi Feherke,

Thanks for providing additional information, I do appreciate all the help, I'm struglling with unix scripting as of the moment!

Too bad, I only have awk ...! Thanks for your help anyway ...

Cheers!
 
Try nawk perhaps, if you are under Solaris? What OS is it anyway?

Also, your test data differs from the data in your original post?

Annihilannic.
 
Hi Annihilannic,

Thanks for pointing out the differences between the input files. I revised the file and run the command using "nawk".

This is the output I got ... thanks in advance! :eek:)

Code:
zen:/u/mviado$ more file
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
195.93.21.40 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
212.227.90.54 - - [31/May/2006:01:00:00 +0100]
82.29.155.235 - - [31/May/2006:01:00:00 +0100]
zen:/u/mviado$ nawk -F '[ [:]+' '{if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete u}l=$4" "$5;u[$1]++}' file
nawk: you can only delete array[element] at source line 1
 context is
        {if(l!=$4" "$5){n=0;for(i in u)n++;print l,n;delete >>>  u} <<< 
nawk: syntax error at source line 1
nawk: illegal statement at source line 1

 
Sorry I forgot to include our server basic info ...

SunOS zen 5.8 Generic_117350-02 sun4u sparc SUNW,Ultra-80
 
Hi Feherke,

This is the output I got ...

Code:
zen:/u/mviado$ tr '[:' '  ' < file | awk '{if(l!=$4" "$5){n=0;for(i in u){n++;delete u[i]}print l,n}l=$4" "$5;u[$1]++}' 
awk: u is not an array
 record number 1
zen:/u/mviado$
 
I modified feherke's code and tested on Solaris:

Code:
/usr/xpg4/bin/awk -F '[[ :]+' 'function p() { n=0; for (i in u) n++; print l,n; delete u} NR > 1 && l!=$4 " " $5 { p() } { l=$4
" " $5 ; u[$1]++ } END { p() } ' filename

Note that I changed the printout to a function so it works for the last set of data too (executed from END clause) and also made it not print anything for the first record.

Annihilannic.
 
Hi Annihilannic,

The script is producing continuous "31/May/2006 1" ... I just basically copy the command you provided and just replace the correct filename --> file

Code:
zen:/u/mviado$  { p() } { l=$4^J" " $5 ; u[$1]++ } END { p() } ' file                                    <
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1
31/May/2006 1

I noticed that the output lines produced is same as the number of file input lines. Did I miss anything?

Many thanks ...

 
It's all supposed to be on one line... unfortunately the tek-tips web site couldn't cope with the width of the line. Take out the line feed after l=$4 and it should work fine.

Annihilannic.
 
I removed that line feed "^J" in the command, sorry I wasn't able to notice this error. To be honest the commands you're suggesting are way too advanced for me... I just mainly rely on the commands provided (copy, paste and run) :eek:)

The output I got is only "1" ...
Code:
zen:/u/mviado$ /usr/xpg4/bin/awk -F '[[ :]+' 'function p() { n=0; for (i in u) n++; print l,n; delete u} NR > 1 && l!=$>
       1
zen:/u/mviado$

I'm missing the count of unique IP Addresses, as you can see from my previous file example, I have this IP Address "212.227.90.54" repeating several times during:
"[31/May/2006:01:00:00 +0100]"

So I'm kinda hoping to have different line count for "82.29.155.235" on that day ...

Thanks!




 
When I run it I get output like this (I added a few more test cases to your sample data):

[tt]31/May/2006 01 3
31/May/2006 02 7[/tt]

Why not put it in a script instead of just pasting on to the command line?

Another method for you to try, again, all on one line:

Code:
nawk -F'[[ :]+' '{print $1,$4,$5}' filename | sort -k 2,2 -k 3,3 -k 1,1 | uniq | cut -d' ' -f 2- | uniq -c

Note that the output is in a different order, the count is first.

Annihilannic.
 
Hi Annihilannic/Feherke,

At last it worked!!! THANK YOU!!! :eek:)

If I put them in a script, both of them are working fine ...

But running them in command line, the "/usr/xpg4/bin/awk -F '[[ :]+' 'function ..." approach is not producing the expected output ...

Thank you guys for providing extra help/effort for a beginner like me!

Cheers! :eek:)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top