Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

small shell scripts 2

Status
Not open for further replies.

jayjaybigs

IS-IT--Management
Jan 12, 2005
191
CA
I was wondering if anyone has a small shell scripts that do the following:

I have a file:

384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617
439345,xxxxxxxxx,xxxxxxx,xxxx,20000819
437829,xxxxxxxxx,xxxxxxx,xxxx,19870903
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617

I will like to extract the duplicate based on first field and the last field.

Hence my final file should be:
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617
 
A starting point:
nawk '
{k=$1","$NF;if(++a[k]>1)b[k]=$0}
END{for(k in b)print b[k]}
' /path/to/input > output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
or:

Code:
nawk -F',' 'x[$1,$NF]++' input

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi Vgersh,

Thanx for your help all. I actually run the code awk -F',' 'x[$1,$NF]++' expiry.txt > results.txt

for
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617
439345,xxxxxxxxx,xxxxxxx,xxxx,20000819
437829,xxxxxxxxx,xxxxxxx,xxxx,19870903
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617

The number in result.txt was more than 2.

I am not sure that I explained clearly earlier.

I am looking to extract only one record each of all the duplicates based on 1.first filed and 2.fifth filed

also, xxxxxxxx could be any string.
 
Have you tried this ?
awk -F',' '
{k=$1","$NF;if(++a[k]>1)b[k]=$0}
END{for(k in b)print b[k]}
' /path/to/input > output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
given this sample input, my output is:

384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617

what would you expect it to be?

If on Solaris, use nawk instead of awk.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
A st@r for Vlad.

However, if the duplicate record occurs more than twice, it will be printed more than once.

Perhaps this is what the poster wants:
Code:
BEGIN { FS="," }
2 == ++x[$1,$NF]

If you have nawk, use it instead of awk because on some systems awk is very old and lacks many useful features. Under Solaris, use /usr/xpg4/bin/awk.

For an introduction to Awk, see faq271-5564.
 
I stand corrected - good catch, William [if I'm not mistaken].

Good posts @cla !

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top