So, I'm back again. Due to a lack of programming skills in AWK combined with a lack general of understanding for the analysis work we are doing combined with general laziness from "end users" I need some more assistance, and this is something that I do not know how to use/solve at all.
The script in question is the following:
#!/usr/bin/awk -f
BEGIN{t="/dev/tty";printf "Enter number of molecules to average: ">t;getline<t;inp_num=$1}
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME}
{gsub (/\(+|\)/," ")}
NF>=15{
tott+=$15;++denom;tothb+=$10
printf "%10.2f %10.1f\n",$10,$15>out1
}
END{
close(out1);avglt=tott/denom
while((getline<out1)>0)tottsq+=(($2-avglt)^2)
avocc=tothb/inp_num
sd_lt=sqrt(tottsq/(denom-1))
semlt=(tottsq/(denom-1))/(sqrt(denom))
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
printf " Sum of lifetimes: %10.2f\n",tott>out2
printf " Average lifetime: %10.2f\n",avglt>out2
printf " SD lifetime: %10.2f\n",sd_lt>out2
printf " SEM lifetime: %10.2f\n",semlt>out2
}
These were problems I was prepared to ignore, however, some people do not read the script message output, and therefore become very confused when one or both output files are missing!
If the analysis output which is feed to the AWK script contains no data points, only text, there is nothing to put into out1, and nothing gets calculated in out2,(bold black part of script) which I find completely normal. However, if one does not think about reading the output message error, this seems to cause tremendous problems. This means that if there are no data points, I would need a modifications that would probably look like:
if denom=0 (technically NR==0) in out1, or if NR<=14 in input, then print NO DATA POINTS IN INPUT - NO HYDROGEN BONDS DETECTED!!!!!!!!
So that people notices this error. Either print it directly in terminal, however, I strongly doubt that anyone will notice this either, so preferably in both output files.
If there is only one (1) datapoint, (one hydrogen bond detected) with occupancy and lifetime printed to out1, one cannot treat these single values statistically, which means that I get two numbers in out1, as it should be! BUT, since there is no way to calculate standard error or standard deviation on single point, there is an error and not out2 is produced. This would have to be fixed (for the red bold part of the script) with something like:
if denom = 1 (which technically is NR==1) in out1 (less than NR==13 in input), then skip SD, SEM,
For me, I think this would be enough, but most likely it would also have to print something like "no calculation possible, single data point" for SD and SEM in out2. Otherwise my guess is that I'll be back here begging for help 30 minutes after the first "user" tries to apply the script to a poor interaction analysis output file.
I have no prior knowledge of IF THEN ELSE usage, so I cannot solve this on my own, do not even know where to begin so all help as enormously appreciated.
My apologies if I come off as crude, but I've just spent the last 30 minutes trying to explain why you cannot calculate SD and SEM for a single datapoint and therefore why you do not get a file output.
This script, which you so kindly helped me assemble has saved me about 20 minutes per outfile to analyze, but in order to not loose 6 times the saved time trying to explain why it sometimes fail I need to resolve this.
Best regards to all
Gustaf
The script in question is the following:
#!/usr/bin/awk -f
BEGIN{t="/dev/tty";printf "Enter number of molecules to average: ">t;getline<t;inp_num=$1}
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME}
{gsub (/\(+|\)/," ")}
NF>=15{
tott+=$15;++denom;tothb+=$10
printf "%10.2f %10.1f\n",$10,$15>out1
}
END{
close(out1);avglt=tott/denom
while((getline<out1)>0)tottsq+=(($2-avglt)^2)
avocc=tothb/inp_num
sd_lt=sqrt(tottsq/(denom-1))
semlt=(tottsq/(denom-1))/(sqrt(denom))
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
printf " Sum of lifetimes: %10.2f\n",tott>out2
printf " Average lifetime: %10.2f\n",avglt>out2
printf " SD lifetime: %10.2f\n",sd_lt>out2
printf " SEM lifetime: %10.2f\n",semlt>out2
}
These were problems I was prepared to ignore, however, some people do not read the script message output, and therefore become very confused when one or both output files are missing!
If the analysis output which is feed to the AWK script contains no data points, only text, there is nothing to put into out1, and nothing gets calculated in out2,(bold black part of script) which I find completely normal. However, if one does not think about reading the output message error, this seems to cause tremendous problems. This means that if there are no data points, I would need a modifications that would probably look like:
if denom=0 (technically NR==0) in out1, or if NR<=14 in input, then print NO DATA POINTS IN INPUT - NO HYDROGEN BONDS DETECTED!!!!!!!!
So that people notices this error. Either print it directly in terminal, however, I strongly doubt that anyone will notice this either, so preferably in both output files.
If there is only one (1) datapoint, (one hydrogen bond detected) with occupancy and lifetime printed to out1, one cannot treat these single values statistically, which means that I get two numbers in out1, as it should be! BUT, since there is no way to calculate standard error or standard deviation on single point, there is an error and not out2 is produced. This would have to be fixed (for the red bold part of the script) with something like:
if denom = 1 (which technically is NR==1) in out1 (less than NR==13 in input), then skip SD, SEM,
For me, I think this would be enough, but most likely it would also have to print something like "no calculation possible, single data point" for SD and SEM in out2. Otherwise my guess is that I'll be back here begging for help 30 minutes after the first "user" tries to apply the script to a poor interaction analysis output file.
I have no prior knowledge of IF THEN ELSE usage, so I cannot solve this on my own, do not even know where to begin so all help as enormously appreciated.
My apologies if I come off as crude, but I've just spent the last 30 minutes trying to explain why you cannot calculate SD and SEM for a single datapoint and therefore why you do not get a file output.
This script, which you so kindly helped me assemble has saved me about 20 minutes per outfile to analyze, but in order to not loose 6 times the saved time trying to explain why it sometimes fail I need to resolve this.
Best regards to all
Gustaf