Hello everyone!
Its been a few months since last time I posted. I created a script for AWK to aid in data analysis from some analyses we (me and co-workers) perform in connection to MD simulations. There was a minor problem that I solved using two different scripts, depending on which mode the analyses is performed you get different amounts of columns, which means that one script did not work on both analysis outputs. I did not find this to be a overwhelming problem, but it seems that some found it aggravating to keep track of which analysis method had been used to produce the AWK input file.
Today I finally had some time to spare, so I tried to solve this and create a "universal" AWK script (universal being a little over the top since there are only two different layouts of input files).
If any of you are at all interested and have a few minutes over I would really appreciate any feedback on this script. Does it look OK, is there a better way to solve the problem, anything that can be improved and so forth...
Best regards
//Gustaf
#!/usr/bin/awk -f
BEGIN{t="/dev/tty";printf "Enter number of molecules to average: ">t;getline<t;inp_num=$1}
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME}
{gsub (/\(+|\)/," ")}
{if(NF>=15){
tott+=$15;++denom;tothb+=$10
printf "%10.2f %10.1f\n",$10,$15>out1
}
}
{if(NF>=11){
tothb+=$10;++denom
printf "%10.2f\n",$10>out1
}
}
END{
if(denom==0){
x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
print x>out1;print x>out2;exit
}
close(out1);
{if(tott>0){
avglt=tott/denom
while((getline<out1)>0)tottsq+=(($2-avglt)^2)
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
printf " Sum of lifetimes: %10.2f\n",tott>out2
printf " Average lifetime: %10.2f\n",avglt>out2
if(denom>1){
sd_lt=sqrt(tottsq/(denom-1));semlt=(tottsq/(denom-1))/(sqrt(denom))
printf " SD lifetime: %10.2f\n",sd_lt>out2
printf " SEM lifetime: %10.2f\n",semlt>out2
} else print " Single HBOND event, no SD or SEM calculation possible!">out2
}
}
{if (tott==0){
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
if(denom<1){ print " Single HBOND event, no SD or SEM calculation possible!">out2 }
}
}
}
Its been a few months since last time I posted. I created a script for AWK to aid in data analysis from some analyses we (me and co-workers) perform in connection to MD simulations. There was a minor problem that I solved using two different scripts, depending on which mode the analyses is performed you get different amounts of columns, which means that one script did not work on both analysis outputs. I did not find this to be a overwhelming problem, but it seems that some found it aggravating to keep track of which analysis method had been used to produce the AWK input file.
Today I finally had some time to spare, so I tried to solve this and create a "universal" AWK script (universal being a little over the top since there are only two different layouts of input files).
If any of you are at all interested and have a few minutes over I would really appreciate any feedback on this script. Does it look OK, is there a better way to solve the problem, anything that can be improved and so forth...
Best regards
//Gustaf
#!/usr/bin/awk -f
BEGIN{t="/dev/tty";printf "Enter number of molecules to average: ">t;getline<t;inp_num=$1}
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME}
{gsub (/\(+|\)/," ")}
{if(NF>=15){
tott+=$15;++denom;tothb+=$10
printf "%10.2f %10.1f\n",$10,$15>out1
}
}
{if(NF>=11){
tothb+=$10;++denom
printf "%10.2f\n",$10>out1
}
}
END{
if(denom==0){
x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
print x>out1;print x>out2;exit
}
close(out1);
{if(tott>0){
avglt=tott/denom
while((getline<out1)>0)tottsq+=(($2-avglt)^2)
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
printf " Sum of lifetimes: %10.2f\n",tott>out2
printf " Average lifetime: %10.2f\n",avglt>out2
if(denom>1){
sd_lt=sqrt(tottsq/(denom-1));semlt=(tottsq/(denom-1))/(sqrt(denom))
printf " SD lifetime: %10.2f\n",sd_lt>out2
printf " SEM lifetime: %10.2f\n",semlt>out2
} else print " Single HBOND event, no SD or SEM calculation possible!">out2
}
}
{if (tott==0){
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.2f\n",tothb>out2
printf " Average Occupancy: %10.2f\n\n",avocc>out2
if(denom<1){ print " Single HBOND event, no SD or SEM calculation possible!">out2 }
}
}
}