Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with IF and THEN arguments! 1

Status
Not open for further replies.

GusGrave

Programmer
Nov 17, 2010
41
SE
So, I'm back again. Due to a lack of programming skills in AWK combined with a lack general of understanding for the analysis work we are doing combined with general laziness from "end users" I need some more assistance, and this is something that I do not know how to use/solve at all.

The script in question is the following:

#!/usr/bin/awk -f
BEGIN{t="/dev/tty";printf "Enter number of molecules to average: ">t;getline<t;inp_num=$1}
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME}
{gsub (/\(+|\)/," ")}
NF>=15{
tott+=$15;++denom;tothb+=$10
printf "%10.2f %10.1f\n",$10,$15>out1
}
END{
close(out1);avglt=tott/denom
while((getline<out1)>0)tottsq+=(($2-avglt)^2)
avocc=tothb/inp_num
sd_lt=sqrt(tottsq/(denom-1))
semlt=(tottsq/(denom-1))/(sqrt(denom))
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
printf " Sum of lifetimes: %10.2f\n",tott>out2
printf " Average lifetime: %10.2f\n",avglt>out2

printf " SD lifetime: %10.2f\n",sd_lt>out2
printf " SEM lifetime: %10.2f\n",semlt>out2

}

These were problems I was prepared to ignore, however, some people do not read the script message output, and therefore become very confused when one or both output files are missing!

If the analysis output which is feed to the AWK script contains no data points, only text, there is nothing to put into out1, and nothing gets calculated in out2,(bold black part of script) which I find completely normal. However, if one does not think about reading the output message error, this seems to cause tremendous problems. This means that if there are no data points, I would need a modifications that would probably look like:

if denom=0 (technically NR==0) in out1, or if NR<=14 in input, then print NO DATA POINTS IN INPUT - NO HYDROGEN BONDS DETECTED!!!!!!!!

So that people notices this error. Either print it directly in terminal, however, I strongly doubt that anyone will notice this either, so preferably in both output files.

If there is only one (1) datapoint, (one hydrogen bond detected) with occupancy and lifetime printed to out1, one cannot treat these single values statistically, which means that I get two numbers in out1, as it should be! BUT, since there is no way to calculate standard error or standard deviation on single point, there is an error and not out2 is produced. This would have to be fixed (for the red bold part of the script) with something like:

if denom = 1 (which technically is NR==1) in out1 (less than NR==13 in input), then skip SD, SEM,

For me, I think this would be enough, but most likely it would also have to print something like "no calculation possible, single data point" for SD and SEM in out2. Otherwise my guess is that I'll be back here begging for help 30 minutes after the first "user" tries to apply the script to a poor interaction analysis output file.

I have no prior knowledge of IF THEN ELSE usage, so I cannot solve this on my own, do not even know where to begin so all help as enormously appreciated.

My apologies if I come off as crude, but I've just spent the last 30 minutes trying to explain why you cannot calculate SD and SEM for a single datapoint and therefore why you do not get a file output.

This script, which you so kindly helped me assemble has saved me about 20 minutes per outfile to analyze, but in order to not loose 6 times the saved time trying to explain why it sometimes fail I need to resolve this.

Best regards to all
Gustaf
 
Something like this ?
Code:
...
END{
  if(denom==0){
    x="NO DATA POINTS IN INPUT - NO HYDROGEN BONDS DETECTED!!!!!!!!"
    print x>out1;print x>out2;exit
  }
  close(out1);avglt=tott/denom
  while((getline<out1)>0)tottsq+=(($2-avglt)^2)
  avocc=tothb/inp_num
  sd_lt=sqrt(tottsq/(denom-1))
  semlt=(tottsq/(denom-1))/(sqrt(denom))
  printf "   Summary data for hbond analysis\n\n">out2
  printf "   Sum of Occupancy:      %10.2f\n",tothb>out2
  printf "   Average Occupancy:     %10.2f\n\n",avocc>out2
  printf "   Sum of lifetimes:      %10.2f\n",tott>out2
  printf "   Average lifetime:      %10.2f\n",avglt>out2
  if(denom>1){
    sd_lt=sqrt(tottsq/(denom-1));semlt=(tottsq/(denom-1))/(sqrt(denom))
    printf "   SD lifetime:           %10.2f\n",sd_lt>out2
    printf "   SEM lifetime:          %10.2f\n",semlt>out2
  } else print "no calculation possible, single data point, for SD and SEM">out2
}

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
I know you're not supposed to curse, but you are a fu***ng lifesaver, I see where you going, and now I have something to work with. So thank you very much!

This is exactly what I was looking for, some wires needs to be untangled, I still get error output for 1 data point files, and no out2 file. But now there's something to work with

I'll copy in a small analysis outfile below:

HBOND SUMMARY:
Data was saved to series 1, output to file hb_ATZ_HAB_XLI_OAF.out, time interval is 2.00

data was sorted, intra-residue interactions are NOT included,
Distance cutoff is 3.00 angstroms, angle cutoff is 120.00 degrees
Hydrogen bond information dumped for occupancies > 0.00

Dumping schematic of time series after each h-bond, key follows:
| . - o x * @ |
0-5% 5-20% 20-40% 40-60% 60-80% 80-95% 95-100% occupancy

DONOR ACCEPTORH ACCEPTOR
atom# :res@atom atom# :res@atom atom# :res@atom %occupied distance angle lifetime maxocc
| 2000 :86@OAF | 68 :3@HAB 67 :3@NAJ | 1.96 2.884 ( 0.07) 19.51 ( 8.36) 4.5 ( 0.0) 7 | .o- |
| 684 :39@OAF | 96 :4@HAB 95 :4@NAJ | 1.70 2.878 ( 0.08) 18.21 ( 9.28) 5.0 ( 0.0) 5 | o. . |
| 1832 :80@OAF | 40 :2@HAB 39 :2@NAJ | 0.50 2.830 ( 0.08) 23.59 ( 9.84) 3.8 ( 0.0) 4 | - |
| 544 :34@OAF | 12 :1@HAB 11 :1@NAJ | 0.12 2.928 ( 0.04) 27.94 (10.16) 4.0 ( 0.0) 3 | . |
| 796 :43@OAF | 96 :4@HAB 95 :4@NAJ | 0.06 2.915 ( 0.05) 17.58 (14.02) 3.0 ( 0.0) 2 | |

---------------- --------------------------------- -------------------------------------


This is how the file being processed look. The only values we're interested in is %occupied and lifetime (4.5, 5.0, 3.8, 4.0, 3.0). These are the values being processed by AWK.

If you like you can try running the script, with only the red line = 1 data point, with none of the bold lines = not data output, with all bold lines, a normal output. The only tweek left to include is the calculation of averages but not SD and SEM outputted to out2 in case of a "one-liner" analysis output. Of course, the text format screws up the line distribution, I just use "make plain text" in textedit to restore the file.

I'll say it again, you are great, and thank you so much for your help! I'm going out for lunch, but I'll check back in about an hour, and scratch the old nugget and se what I can come up with!

Only the best
Gustaf
 
Sorry for the typo:
Code:
...
END{
  if(denom==0){
    x="NO DATA POINTS IN INPUT - NO HYDROGEN BONDS DETECTED!!!!!!!!"
    print x>out1;print x>out2;exit
  }
  close(out1);avglt=tott/denom
  while((getline<out1)>0)tottsq+=(($2-avglt)^2)
  avocc=tothb/inp_num
  printf "   Summary data for hbond analysis\n\n">out2
  printf "   Sum of Occupancy:      %10.2f\n",tothb>out2
  printf "   Average Occupancy:     %10.2f\n\n",avocc>out2
  printf "   Sum of lifetimes:      %10.2f\n",tott>out2
  printf "   Average lifetime:      %10.2f\n",avglt>out2
  if(denom>1){
    sd_lt=sqrt(tottsq/(denom-1));semlt=(tottsq/(denom-1))/(sqrt(denom))
    printf "   SD lifetime:           %10.2f\n",sd_lt>out2
    printf "   SEM lifetime:          %10.2f\n",semlt>out2
  } else print "no calculation possible, single data point, for SD and SEM">out2
}

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Back with a full stomach and an empty brain ;)

Yes, success. This was great! Thank you very VERY much, this will save me loads of unnecessary discussions!

Hopefully I'll get some time over to get deep and dirty into programming over the next couple of years. But this time I really needed a quick fix and as usual in this forum, it was delivered within the hour!

The analytical tools we're using will probably be obsolete within the next year or so, and during that time my guess is that about 10-15 persons will use this script. Of these I guess a maximum of 3-4 persons will actually open and check the code, but for my own satisfaction I'd like to include a note regarding all your help, any particular preferences regarding this?

All my respect, gratitude and best wishes
// Gustaf
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top