Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with an awk script to filter tables

Status
Not open for further replies.

Lobar17

Technical User
Apr 27, 2018
2
0
0
CH
Dear Forum,

I need to filter text files according to some measured and recorded values (sensor). The files can be large and the recorded data points could differ in length (NF is variable). I have worked out a solution in awk (see below) but it is very messy and not flexible. I would appreciate your help for a better, cleaner and more robust way to solve my problem?

A typical input file would look like this:

Code:
ID_AB1\tsensor01(1.0),sensor02(0.6),sensor03(0.5),sensor04(0.45)
ID_AB2\tsensor01(1.0),sensor02(0.95),sensor03(0.90),sensor04(0.80)
ID_AC3\tsensor01(1.0),sensor02(1.0),sensor03(1.0),sensor04(1.0)
ID_AD1\tsensor01(0.9),sensor02(0.6)
ID_BA2
...

At the end I would like an output file like the following:

Code:
ID_AB1;sensor01;low;low;low
ID_AB2;sensor01;sensor02;sensor03;low
ID_AC3;sensor01;sensor02;sensor03;sensor04
ID_AD1;sensor01;low;na;na
ID_BA2;na;na;na;na

Sensors with a low value (e.g. PF=0.8) should be replaced as "low". The sensor signal is decreasing. It is possible that consecutive sensors have an equal value but a sensor has never a high signal to the one before. It is, however, possible that some sensor are missing and this needs to considered (e.g. na in the output)

My solution work and I could extend it to cover more possible cases but it is confusing and not flexible.

Code:
sed 's/(/,/g' out.singnal | sed 's/)//g' | awk -F"\t|," -v PF=0.8 '{
  printf "%s;", $1
  if(NF==9 && $3>=PF && $5>=PF && $7>=PF && $9>=PF)
          printf " %s; %s; %s; %s\n", $2,$4,$6,$8
  else if(NF==9 && $3>=PF && $5>=PF && $7>=PF && $9<PF)
          printf " %s; %s; %s; %s\n", $2,$4,$6,"low"
  else if(NF==9 && $3>=PF && $5>=PF && $7<PF && $9<PF)
          printf " %s; %s; %s; %s\n", $2,$4,"low","low"
  else if(NF==9 && $3>=PF && $5<PF && $7<PF && $9<PF)
          printf " %s; %s; %s; %s\n", $2,"low","low","low"
  else if(NF==9 && $3<PF && $5<PF && $7<PF && $9<PF)
          printf " %s; %s; %s; %s\n", "low","low","low","low"
  else if(NF==7 && $3>=PF && $5>=PF && $7>=PF)
          printf " %s; %s; %s; %s\n", $2,$4,$6,"na"
  else if(NF==7 && $3>=PF && $5>=PF && $7<PF)
          printf " %s; %s; %s; %s\n", $2,$4,"low","na"
  else if(NF==7 && $3>=PF && $5<PF && $7<PF)
          printf " %s; %s; %s; %s\n", $2,"low","low","na"
  else if(NF==7 && $3<PF && $5<PF && $7<PF)
          printf " %s; %s; %s; %s\n","low","low","low","na"
 }'

Are there any awk wizards able to help me to improve my script? Thanks a lot!

 
Hi Lobar17,

Here is my attempt to simplify the code:

Code:
[COLOR=#0000ff]# Run:[/color]
[COLOR=#0000ff]#   awk -f lobar17.awk lobar17.txt[/color]

[COLOR=#6a5acd]BEGIN[/color] {
  [COLOR=#0000ff]# field separator must be: \t (tab character) or comma[/color]
  [COLOR=#6a5acd]FS[/color] = [COLOR=#ff00ff]"  |,"[/color]
  LOW_VALUE = [COLOR=#ff00ff]0.8[/color]
} 

{ 
  id = [COLOR=#6a5acd]$1[/color]
  s1 = sensor_value([COLOR=#6a5acd]$2[/color])
  s2 = sensor_value([COLOR=#6a5acd]$3[/color])
  s3 = sensor_value([COLOR=#6a5acd]$4[/color])
  s4 = sensor_value([COLOR=#6a5acd]$5[/color])
  [COLOR=#804040][b]printf[/b][/color]([COLOR=#ff00ff]"[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff];[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff];[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff];[/color][COLOR=#6a5acd]%s[/color][COLOR=#ff00ff];[/color][COLOR=#6a5acd]%s[/color][COLOR=#6a5acd]\n[/color][COLOR=#ff00ff]"[/color][COLOR=#6a5acd],[/color] id[COLOR=#6a5acd],[/color] s1[COLOR=#6a5acd],[/color] s2[COLOR=#6a5acd],[/color] s3[COLOR=#6a5acd],[/color] s4)
}

[COLOR=#0000ff]# -- fuctions --[/color]
[COLOR=#804040][b]function[/b][/color] sensor_value(fld) {
  result = [COLOR=#ff00ff]"nothing"[/color]
  [COLOR=#804040][b]if[/b][/color] ([COLOR=#008080]match[/color](fld[COLOR=#6a5acd],[/color] [COLOR=#ff00ff]/(sensor[[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]][[/color][COLOR=#804040][b]0-9[/b][/color][COLOR=#ff00ff]])[/color][COLOR=#6a5acd]\([/color][COLOR=#ff00ff]([/color][COLOR=#6a5acd].+[/color][COLOR=#ff00ff])[/color][COLOR=#6a5acd]\)[/color][COLOR=#ff00ff]/[/color][COLOR=#6a5acd],[/color] match_list)) {
    sensor_name = match_list[[COLOR=#6a5acd]1[/color]]
    num_val = match_list[[COLOR=#6a5acd]2[/color]]
    [COLOR=#804040][b]if[/b][/color] (num_val <= LOW_VALUE) {
      result = [COLOR=#ff00ff]"low"[/color]
    }
    [COLOR=#804040][b]else[/b][/color] {
      result = sensor_name
    }
  }
  [COLOR=#804040][b]else[/b][/color] {
    result = [COLOR=#ff00ff]"na"[/color]
  }
  [COLOR=#804040][b]return[/b][/color] result
}

Output:
Code:
$ awk -f lobar17.awk lobar17.txt
ID_AB1;sensor01;low;low;low
ID_AB2;sensor01;sensor02;sensor03;low
ID_AC3;sensor01;sensor02;sensor03;sensor04
ID_AD1;sensor01;low;na;na
ID_BA2;na;na;na;na
 
the function sensor_value(fld) could be simplified:

Code:
function sensor_value(fld) {
  result = "na"
  if (match(fld, /(sensor[0-9][0-9])\((.+)\)/, match_list)) {
    sensor_name = match_list[1]
    num_val = match_list[2]
    if (num_val <= LOW_VALUE) {
      result = "low"
    }
    else {
      result = sensor_name
    }
  }
  return result
}
 
Hi mikrom,

Thanks a million - it works just perfect !!!
 
Here's a different awk recipe:
Code:
awk -F"[\t]|[(]|[)][,]?" '{
    for (i=1;i<=NF;i++)
        x[$1] = (i==1 ? $1 : (i%2 ? x[$1] ";" ($i > 0.8 ? $(i-1) : "low") : x[$1]))
    n=split(x[$1],a,";")
    for (j=n;j<5;j++) x[$1]=x[$1]";na"
} END {for(i in x) print x[i]}' file
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top