Easy UNIX file question... 2

mwesticle · Nov 4, 2004

I have a fixed-width file that contains records that are 20-bytes long, and 12 of those bytes make up an "individual id" field. the remaining 8 bytes contain "other" information. The "individual id" may or may not be unique across records. So, for example, the contents of my file might look like this:

100000000000AAAAAAAA
100000000000
200000000000
300000000000BB BB BB
300000000000
300000000000B B BB
400000000000
400000000000

...where "individual id" is "100000000000", "200000000000", and so on... I need a little nawk or awk or ksh (or whatever will work) script that will loop through the records, and spit out all records that have a unique "individual id". In the case where "individual id" is not unique ("100000000000", "300000000000", and "400000000000" above), I need it to spit out the one record that has the most "other" information filled in. In the case where "individual id" is not unique, and none of the records containing that "individual id" have any "other" information filled in ("400000000000" above), the script should just pick one to spit out (doesn't matter which one). So, my output, in this case, would look like this:

100000000000AAAAAAAA
200000000000
300000000000BB BB BB
400000000000

I know this is probably fairly easy to do, I just need some help. Thanks to anyone who responds!

vgersh99 · Nov 4, 2004

note sure about this statement - the 'other' part of it:

I need it to spit out the one record that has the most "other" information filled in.

but here's something to start with

nawk -f mw.awk myFile.txt

Code:

BEGIN {
  IDlen="12"
}

{
   id=substr($1, 1, IDlen)
   if ( !(id in arr))
      arr[id]=$0
   else
      if ( length($0) > length(arr[id]) )
         arr[id]=$0
}
END {
  for (idx in arr)
     print arr[idx]
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

PHV · Nov 4, 2004

Something like this ?
awk '
{ id=substr($0,1,12);other=substr($0,13)
if(length(other)>length(a[id]))a[id]=other
} END {for(i in a)print i""a|"sort"}
' /path/to/input > output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Easy UNIX file question... 2

mwesticle

Programmer

vgersh99

Programmer

PHV

MIS

Similar threads

Part and Inventory Search

Sponsor