Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with search or "limiting"/"filtering" argument 1

Status
Not open for further replies.

GusGrave

Programmer
Nov 17, 2010
41
SE
I've posted regarding this script before but have not been working with it for a while. Just started again today and now I realize that I have to use a new approach to my problem. In order for this to be useful to anyone else but me I have to minimize the amount of "individual thinking" needed to use the script. It has to be able to handle 2 different kinds of input files and differentiate between them (three actually since one input will not have any data lines, only text lines), in contrast to the separate two scripts I already have working for each type of input, unfortunately it has become apparent that it is hard to keep track of which input you feed this script and this is causing faulty results and confusion.

Below is the script I'm working with:
-------------------------------------------------------------------------------
#!/usr/bin/awk -f
BEGIN{t="/dev/tty";printf "Enter number of molecules to average: ">t;getline<t;inp_num=$1}
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME}
NR<=13{next}
{
gsub (/\(+|\)/," ")
}
{
if(NF>=15){ #1
tott+=$15;++denom;tothb+=$10
printf "%10.2f %10.1f\n",$10,$15>out1
}

else if(NF>=11){ #2
tothb+=$10;++denom
printf "%10.2f\n",$10>out1
}
}
END{
if(denom==0){
x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
print x>out1;print x>out2;exit
}
close(out1);
if(tott>0){
avglt=tott/denom
while((getline<out1)>0)tottsq+=(($2-avglt)^2)
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
printf " Sum of lifetimes: %10.2f\n",tott>out2
printf " Average lifetime: %10.2f\n",avglt>out2
if(denom>1){
sd_lt=sqrt(tottsq/(denom-1));semlt=(tottsq/(denom-1))/(sqrt(denom))
printf " SD lifetime: %10.2f\n",sd_lt>out2
printf " SEM lifetime: %10.2f\n",semlt>out2
} else print " Single HBOND event, no SD or SEM calculation possible!">out2
}

if (tott==0){
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
if(denom<1){ print " Single HBOND event, no SD or SEM calculation possible!">out2 }
}
}
----------------------------------------------------------------------------------

I don't know why, but by adding the "else if" instead of just "if" at line 12 (#2) the "math-part" almost works, but there is two problems left. The NR<=13{next} (the red text) works only for one type of input (#1), loosing 4 data lines of the other (#2), by decreasing to 9, an extra line of zeroes is included in the output for one input type (#1) which is screwing up the "Average Lifetime" calculation since this counts the zero line as an event. This since one text lines before the data actually contains NF>=15. For #1 and #2 I want the code to recognize if it should start reading data at row 14 (for #1) or at row 10 (for #2)

The way I figure (which might be way off) is that I have three alternatives:

1) Applying the NR<=13{next} after the "if(NF>=15)" and a NR<=9{next} after "if(NF>=11)", but the I get:
>>> NR<=13{ <<< next}
... illegal statement at source line ...

So it seems I cannot apply this filter after the "ifs"/or I am not applying them in a correct manner.

2) Search the second row/line (NR=2) (or the entire document) for the word "series" (which only occurs once in one of the input file types (#1)), then use this as an argument/variable; if "series" is found then apply "if(NF>=15) ...." else (/if not) apply "if(NF>=11)..."

3) (which is just a modification/simplification? of 2)) If the second line of the input has more than 8 fields than apply "(if(NF>=15)..." otherwise apply "(if(NF>11)..." But I cannot figure out how to combine if(NR=2 and NF>=8){} as a single filter?

So the first issue is to get the code to start reading at different rows regarding of the format of the input. The second problem is tho get the code NOT to read the last line of the input, I guess this should be something like NR<=$NR, though I don't know how to actually include this into the code in a good way. Including the last line also adds an extra row of zeros to out1 and messes up the average calculation in out2 since these added lines is counted as events.

As you probably notice, I am not a very experienced programmer and this might cause some problems with my scripts and how I try to formulate my problem/questions. Still, I hope that someone can help we with some suggestions on how to work option 1,2 or 3 into the script so that it can differentiate between the different types of input and exclude the last line of input from calculations to produce accurate results without having to think about which input type you have.

Best regards
//Gustaf
 
You can probably ignore the "not reading the last row" problem, since this is working for the separate script files limited by the NF>=11/15.

Best regards
// Gustaf
 
...
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME;next}
NR==2{nr=(NF>7?13:9):next}
NR<=nr{next}
...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Thank you very much PHV, only one slight modification:

NR==2{nr=(NF>7?13:9):next} should be NR==2{nr=(NF>8?13:9);next}

at least on my system and my AWK version.

It finally works for all different types of input and the user does not have to think about what the are feeding the script!

I can't thank you enough for saving me so much time on solving this! You've helped me on this more than once.

Best regards!
// Gustaf
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top