Dear Forum,
I need to filter text files according to some measured and recorded values (sensor). The files can be large and the recorded data points could differ in length (NF is variable). I have worked out a solution in awk (see below) but it is very messy and not flexible. I would appreciate your help for a better, cleaner and more robust way to solve my problem?
A typical input file would look like this:
At the end I would like an output file like the following:
Sensors with a low value (e.g. PF=0.8) should be replaced as "low". The sensor signal is decreasing. It is possible that consecutive sensors have an equal value but a sensor has never a high signal to the one before. It is, however, possible that some sensor are missing and this needs to considered (e.g. na in the output)
My solution work and I could extend it to cover more possible cases but it is confusing and not flexible.
Are there any awk wizards able to help me to improve my script? Thanks a lot!
I need to filter text files according to some measured and recorded values (sensor). The files can be large and the recorded data points could differ in length (NF is variable). I have worked out a solution in awk (see below) but it is very messy and not flexible. I would appreciate your help for a better, cleaner and more robust way to solve my problem?
A typical input file would look like this:
Code:
ID_AB1\tsensor01(1.0),sensor02(0.6),sensor03(0.5),sensor04(0.45)
ID_AB2\tsensor01(1.0),sensor02(0.95),sensor03(0.90),sensor04(0.80)
ID_AC3\tsensor01(1.0),sensor02(1.0),sensor03(1.0),sensor04(1.0)
ID_AD1\tsensor01(0.9),sensor02(0.6)
ID_BA2
...
At the end I would like an output file like the following:
Code:
ID_AB1;sensor01;low;low;low
ID_AB2;sensor01;sensor02;sensor03;low
ID_AC3;sensor01;sensor02;sensor03;sensor04
ID_AD1;sensor01;low;na;na
ID_BA2;na;na;na;na
Sensors with a low value (e.g. PF=0.8) should be replaced as "low". The sensor signal is decreasing. It is possible that consecutive sensors have an equal value but a sensor has never a high signal to the one before. It is, however, possible that some sensor are missing and this needs to considered (e.g. na in the output)
My solution work and I could extend it to cover more possible cases but it is confusing and not flexible.
Code:
sed 's/(/,/g' out.singnal | sed 's/)//g' | awk -F"\t|," -v PF=0.8 '{
printf "%s;", $1
if(NF==9 && $3>=PF && $5>=PF && $7>=PF && $9>=PF)
printf " %s; %s; %s; %s\n", $2,$4,$6,$8
else if(NF==9 && $3>=PF && $5>=PF && $7>=PF && $9<PF)
printf " %s; %s; %s; %s\n", $2,$4,$6,"low"
else if(NF==9 && $3>=PF && $5>=PF && $7<PF && $9<PF)
printf " %s; %s; %s; %s\n", $2,$4,"low","low"
else if(NF==9 && $3>=PF && $5<PF && $7<PF && $9<PF)
printf " %s; %s; %s; %s\n", $2,"low","low","low"
else if(NF==9 && $3<PF && $5<PF && $7<PF && $9<PF)
printf " %s; %s; %s; %s\n", "low","low","low","low"
else if(NF==7 && $3>=PF && $5>=PF && $7>=PF)
printf " %s; %s; %s; %s\n", $2,$4,$6,"na"
else if(NF==7 && $3>=PF && $5>=PF && $7<PF)
printf " %s; %s; %s; %s\n", $2,$4,"low","na"
else if(NF==7 && $3>=PF && $5<PF && $7<PF)
printf " %s; %s; %s; %s\n", $2,"low","low","na"
else if(NF==7 && $3<PF && $5<PF && $7<PF)
printf " %s; %s; %s; %s\n","low","low","low","na"
}'
Are there any awk wizards able to help me to improve my script? Thanks a lot!