small shell scripts 2

jayjaybigs · Feb 2, 2005

I was wondering if anyone has a small shell scripts that do the following:

I have a file:

384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617
439345,xxxxxxxxx,xxxxxxx,xxxx,20000819
437829,xxxxxxxxx,xxxxxxx,xxxx,19870903
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617

I will like to extract the duplicate based on first field and the last field.

Hence my final file should be:
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617

PHV · Feb 2, 2005

A starting point:
nawk '
{k=$1","$NF;if(++a[k]>1)b[k]=$0}
END{for(k in b)print b[k]}
' /path/to/input > output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

vgersh99 · Feb 2, 2005

or:

Code:

nawk -F',' 'x[$1,$NF]++' input

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

jayjaybigs · Feb 3, 2005

Hi Vgersh,

Thanx for your help all. I actually run the code awk -F',' 'x[$1,$NF]++' expiry.txt > results.txt

for
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617
439345,xxxxxxxxx,xxxxxxx,xxxx,20000819
437829,xxxxxxxxx,xxxxxxx,xxxx,19870903
384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617

The number in result.txt was more than 2.

I am not sure that I explained clearly earlier.

I am looking to extract only one record each of all the duplicates based on 1.first filed and 2.fifth filed

also, xxxxxxxx could be any string.

PHV · Feb 3, 2005

Have you tried this ?
awk -F',' '
{k=$1","$NF;if(++a[k]>1)b[k]=$0}
END{for(k in b)print b[k]}
' /path/to/input > output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

vgersh99 · Feb 3, 2005

given this sample input, my output is:

384348,xxxxxxxxx,xxxxxxx,xxxx,20040212
943053,xxxxxxxxx,xxxxxxx,xxxx,20030617

what would you expect it to be?

If on Solaris, use nawk instead of awk.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

futurelet · Feb 3, 2005

A st@r for Vlad.

However, if the duplicate record occurs more than twice, it will be printed more than once.

Perhaps this is what the poster wants:

Code:

BEGIN { FS="," }
2 == ++x[$1,$NF]

If you have nawk, use it instead of awk because on some systems awk is very old and lacks many useful features. Under Solaris, use /usr/xpg4/bin/awk.

For an introduction to Awk, see faq271-5564.

vgersh99 · Feb 3, 2005

I stand corrected - good catch, William [if I'm not mistaken].

Good posts @cla !

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

small shell scripts 2

jayjaybigs

IS-IT--Management

PHV

MIS

vgersh99

Programmer

jayjaybigs

IS-IT--Management

PHV

MIS

vgersh99

Programmer

futurelet

Programmer

vgersh99

Programmer

Similar threads

Part and Inventory Search

Sponsor