duplicated records

krava · Feb 15, 2008

Hi

I would like to remove duplicated records based on 3 columns. For example,

1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
5 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2

I would like to get

1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
9 10.2 11.2 12.2
10 11.2 12.2 13.2

By help of this forum I wrote this solution

awk '$2 SUBSEP $3 SUBSEP $4 !=ref{print;ref=$2 SUBSEP $3 SUBSEP $4}' data.csv

which gives me

1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2

that is, it works only when the datafile is sorted according to this columns. How to delete duplicate columns in this case if the datafile is not sorted?

thanks
krava

PHV · Feb 15, 2008

You may try this:
awk '!a[$2,$3,$4]{print;++a[$2,$3,$4]}' data.csv

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

pathfinderpathfinder · Feb 19, 2008

well,

I think there s a command called SORT (can t check it right now)
you could probably sort your file before applying AWK.
Is it stupid?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

duplicated records

krava

Programmer

PHV

MIS

pathfinderpathfinder

Technical User

Similar threads

Part and Inventory Search

Sponsor