I have a fixed-width file that contains records that are 20-bytes long, and 12 of those bytes make up an "individual id" field. the remaining 8 bytes contain "other" information. The "individual id" may or may not be unique across records. So, for example, the contents of my file might look like this:
100000000000AAAAAAAA
100000000000
200000000000
300000000000BB BB BB
300000000000
300000000000B B BB
400000000000
400000000000
...where "individual id" is "100000000000", "200000000000", and so on... I need a little nawk or awk or ksh (or whatever will work) script that will loop through the records, and spit out all records that have a unique "individual id". In the case where "individual id" is not unique ("100000000000", "300000000000", and "400000000000" above), I need it to spit out the one record that has the most "other" information filled in. In the case where "individual id" is not unique, and none of the records containing that "individual id" have any "other" information filled in ("400000000000" above), the script should just pick one to spit out (doesn't matter which one). So, my output, in this case, would look like this:
100000000000AAAAAAAA
200000000000
300000000000BB BB BB
400000000000
I know this is probably fairly easy to do, I just need some help. Thanks to anyone who responds!
100000000000AAAAAAAA
100000000000
200000000000
300000000000BB BB BB
300000000000
300000000000B B BB
400000000000
400000000000
...where "individual id" is "100000000000", "200000000000", and so on... I need a little nawk or awk or ksh (or whatever will work) script that will loop through the records, and spit out all records that have a unique "individual id". In the case where "individual id" is not unique ("100000000000", "300000000000", and "400000000000" above), I need it to spit out the one record that has the most "other" information filled in. In the case where "individual id" is not unique, and none of the records containing that "individual id" have any "other" information filled in ("400000000000" above), the script should just pick one to spit out (doesn't matter which one). So, my output, in this case, would look like this:
100000000000AAAAAAAA
200000000000
300000000000BB BB BB
400000000000
I know this is probably fairly easy to do, I just need some help. Thanks to anyone who responds!