Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Easy UNIX file question... 2

Status
Not open for further replies.

mwesticle

Programmer
Nov 19, 2003
51
US
I have a fixed-width file that contains records that are 20-bytes long, and 12 of those bytes make up an "individual id" field. the remaining 8 bytes contain "other" information. The "individual id" may or may not be unique across records. So, for example, the contents of my file might look like this:

100000000000AAAAAAAA
100000000000
200000000000
300000000000BB BB BB
300000000000
300000000000B B BB
400000000000
400000000000

...where "individual id" is "100000000000", "200000000000", and so on... I need a little nawk or awk or ksh (or whatever will work) script that will loop through the records, and spit out all records that have a unique "individual id". In the case where "individual id" is not unique ("100000000000", "300000000000", and "400000000000" above), I need it to spit out the one record that has the most "other" information filled in. In the case where "individual id" is not unique, and none of the records containing that "individual id" have any "other" information filled in ("400000000000" above), the script should just pick one to spit out (doesn't matter which one). So, my output, in this case, would look like this:

100000000000AAAAAAAA
200000000000
300000000000BB BB BB
400000000000

I know this is probably fairly easy to do, I just need some help. Thanks to anyone who responds!
 
note sure about this statement - the 'other' part of it:
I need it to spit out the one record that has the most "other" information filled in.

but here's something to start with

nawk -f mw.awk myFile.txt

Code:
BEGIN {
  IDlen="12"
}

{
   id=substr($1, 1, IDlen)
   if ( !(id in arr))
      arr[id]=$0
   else
      if ( length($0) > length(arr[id]) )
         arr[id]=$0
}
END {
  for (idx in arr)
     print arr[idx]
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Something like this ?
awk '
{ id=substr($0,1,12);other=substr($0,13)
if(length(other)>length(a[id]))a[id]=other
} END {for(i in a)print i""a|"sort"}
' /path/to/input > output

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top