Hi
I would like to remove duplicated records based on 3 columns. For example,
1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
5 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2
I would like to get
1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
9 10.2 11.2 12.2
10 11.2 12.2 13.2
By help of this forum I wrote this solution
awk '$2 SUBSEP $3 SUBSEP $4 !=ref{print;ref=$2 SUBSEP $3 SUBSEP $4}' data.csv
which gives me
1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2
that is, it works only when the datafile is sorted according to this columns. How to delete duplicate columns in this case if the datafile is not sorted?
thanks
krava
I would like to remove duplicated records based on 3 columns. For example,
1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
5 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2
I would like to get
1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
9 10.2 11.2 12.2
10 11.2 12.2 13.2
By help of this forum I wrote this solution
awk '$2 SUBSEP $3 SUBSEP $4 !=ref{print;ref=$2 SUBSEP $3 SUBSEP $4}' data.csv
which gives me
1 2.2 3.2 4.2
2 3.2 4.2 5.2
3 4.2 5.2 6.2
4 5 5 5
6 7.2 8.2 9.2
7 8.2 9.2 10.2
8 5 5 5
9 10.2 11.2 12.2
10 11.2 12.2 13.2
that is, it works only when the datafile is sorted according to this columns. How to delete duplicate columns in this case if the datafile is not sorted?
thanks
krava