Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

find all duplicate rows using awk

Status
Not open for further replies.

maolivar

Technical User
Apr 1, 2008
6
FR
Hi Folks,

I have a log file:

02/06/2010-10:03:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:04:05 C|A,72540,199022957,1,212.154.77.22,212.154.77.22,0
02/06/2010-10:04:05 C|A,72540,199022957,1,212.154.77.22,toto,0
02/06/2010-10:04:05 C|A,72540,199022957,0,212.154.77.22,212.154.77.22,0
02/06/2010-10:04:46 C|A,69952,199005651,0,LWM/200,212.154.77.22,0
02/06/2010-10:05:47 C|A,69952,199005651,0,LWM/200,212.154.77.22,0
02/06/2010-10:05:47 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:07:50 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:08:51 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:11:51 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:12:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:13:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:17:51 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:19:47 C|A,69952,199005651,0,LWM/200,212.154.77.22,0
02/06/2010-10:20:47 C|A,69952,199005651,0,LWM/200,212.154.77.22,0
02/06/2010-10:21:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:25:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0

i want to find the duplicated rows based on one field, the field is the fist after the ",".

my command line works well:

less test.txt awk '{split($2,a,",");line[a[2]]++}; END{ for (i in line) print line, i}'$*|sort -nr

and i got this:
5 68132
5 65769
4 69952
3 72540

But i want to print the result like this:

TIMES ID
5 68132

02/06/2010-10:03:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:12:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:13:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:21:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0
02/06/2010-10:25:24 C|A,68132,199024488,0,LWM/300,212.154.77.33,0

TIMES ID
5 65769

02/06/2010-10:05:47 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:07:50 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:08:51 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:11:51 C|A,65769,199020988,0,LWM/300,212.154.77.33,0
02/06/2010-10:17:51 C|A,65769,199020988,0,LWM/300,212.154.77.33,0

.
.
.

Could you help me please.

Regards

Mike
 
Maybe something like this:
Code:
awk -F, '{ a[$2,NR]=$0; c[$2]++ } 
         END { for( k in a ) { 
	             split(k,b,SUBSEP)
	             if( b[1] in c ) { 
		            print RS "TIMES  ID" RS c[b[1]] "  " b[1] RS
               	    delete c[b[1]]
		        } 
		         for(i=1;i<=NR;i++) if( a[b[1],i] ) { 
			           print a[b[1],i]; delete a[b[1],i]
			     } 
	 	     }
	     }' filename
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top