Skipping duplicate fields before printing

learningawk · Jan 15, 2004

I have a coordinate file that contains vector points that draw closed polygons. The first and last data point have the same coordinate location. I am reformatting the file to be used for input into another application.

Here's a test data set with the first record a column counter.

1234567890123456789012345678901234567890

001sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfs
300000 1111111 22222 34343434 0
32222 34343434 33333 44444444 0
33333 44444444 34343 55555555 0
334343 55555555 66666 77777777 0
366666 77777777 300000 1111111 9
002 cccccccccceeeeeeeeeeeggggg
231xcxcxcxcxcxcxcxcxczxjfljdfladflasdflk
300000 1111111 22222 34343434 0
32222 34343434 33333 44444444 0
33333 44444444 34343 55555555 0
334343 55555555 66666 77777777 0
366666 77777777 300000 1111111 9
002 cccccccccceeeeeeeeeeeggggg
231xcxcxcxcxcxcxcxcxczxjfljdfladflasdflk
300000 1111111 22222 34343434 0
32222 34343434 33333 44444444 0
33333 44444444 34343 55555555 0
334343 55555555 66666 77777777 0
366666 77777777 300000 1111111 9

Column 1 is the record type identifier either a 0, 2 or 3.
0 is header, 2 is an ignored or skipped record and 3 is for the data points. I am using substr to pick the fields and I am trying to omit the duplicate locations before printing using a simple check if previous value = current value.

These groups contain a varied amount of coordinate pairs to describe the polygon.

Here's how the output should be:
001 some headers....
3 300000 1111111
3 22222 34343434
3 33333 44444444
3 34343 55555555
3 66666 77777777
3 300000 1111111

and so on for each group in the file.

Can you skip a substr field if it is found to be a duplicate of a previous record but yet keep the next pair of cordinates on that same record?

Thanks,

tikual · Jan 15, 2004

sed script, is it ok?

/^2/{
d
}
/^3/{
s/^$[^\ ]*$ *$[^\ ]*$ *.*$/3 \1 \2/
}

tikual

aigles · Jan 15, 2004

Awk script :

[tt]
awk '
/^2/ { # Record type "2"
next # Skip
}
/^3/ { # Record type "3"
if (prv1 == $1 && # Same point as previous ?
prv2 == $2) #
next; # Yes, skip
prv1 = $1; # Memorize point coord
prv2 = $2; #
sub("^$.$","& &quot

; # Isolate record type
print $1,$2,$3; # Print Record type and coord
next; # Next record
}
{ # All other record types
print; # Print
}
' <input >output
[tt]

With you datas, the result is :
[tt]
001sdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfs
3 00000 1111111
3 2222 34343434
3 3333 44444444
3 34343 55555555
3 66666 77777777
002 cccccccccceeeeeeeeeeeggggg
3 00000 1111111
3 2222 34343434
3 3333 44444444
3 34343 55555555
3 66666 77777777
002 cccccccccceeeeeeeeeeeggggg
3 00000 1111111
3 2222 34343434
3 3333 44444444
3 34343 55555555
3 66666 77777777
[/tt]

Jean Pierre.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Skipping duplicate fields before printing

learningawk

Technical User

tikual

Technical User

aigles

Technical User

Similar threads

Part and Inventory Search

Sponsor