Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

duplicated records

Status
Not open for further replies.

krava

Programmer
Jun 4, 2007
48
YU
Hi

I would like to remove duplicated records based on 3 columns. For example,

1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
5 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2

I would like to get

1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
9 10.2 11.2 12.2
10 11.2 12.2 13.2


By help of this forum I wrote this solution

awk '$2 SUBSEP $3 SUBSEP $4 !=ref{print;ref=$2 SUBSEP $3 SUBSEP $4}' data.csv

which gives me


1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2

that is, it works only when the datafile is sorted according to this columns. How to delete duplicate columns in this case if the datafile is not sorted?


thanks
krava
 
You may try this:
awk '!a[$2,$3,$4]{print;++a[$2,$3,$4]}' data.csv

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
well,

I think there s a command called SORT (can t check it right now)
you could probably sort your file before applying AWK.
Is it stupid?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top