Hi all!
I have a problem with an AWK-script that I wrote a few years ago with a lot of help from people in this forum.
It’s basically pretty simple, it is supposed to extract some columns from output obtained from analyses of simulations. After obtaining the interesting columns from the input, some elementary calculations are suposed to be performed and then the results presented in a human-readable output. This is cycled over several input files, creating a single file with the relevant data as well as some “control” files that I need to save for documentation.
Now I have run into some problems that I cannot figure out. The software I use for the simulations were recently updated and the new version of the analysis routines creates output files that are nothing like the old. I re-wrote the scripts to accommodate this change, adding some new things as well. I tried this script over some example input files and it worked great, though when feeding the script some 75 input files I got the following error:
I figure that something that should be closed is not closed and some limit is reached. What I cannot figure out is what. To make thinks more frustrating, I am using AWK on a Mac and I know that the system version is “weird” some times, so to accommodate the cluster where simulations are run I also wrote the “exact same script” for GAWK. The GAWK script, however, does not produce the same error when run on the same Mac with the same input, in fact it works perfectly. I’ve been staring at this for a while now, not able to find the issue. The older AWK script for the older format of output still works flawlessly with the same number of old input files using the Mac version of AWK. I really can’t figure this out, especially since the 33rd line is a printf statement to an output file (out2) that is supposed to be closed at the end of each cycle… Any help regarding this would be greatly appreciated!
This is the new script that fails (MacOS AWK):
This is the new script that works (GAWK):
This is the “old” script that also works on +50 files(MacOS AWK):
I have a problem with an AWK-script that I wrote a few years ago with a lot of help from people in this forum.
It’s basically pretty simple, it is supposed to extract some columns from output obtained from analyses of simulations. After obtaining the interesting columns from the input, some elementary calculations are suposed to be performed and then the results presented in a human-readable output. This is cycled over several input files, creating a single file with the relevant data as well as some “control” files that I need to save for documentation.
Now I have run into some problems that I cannot figure out. The software I use for the simulations were recently updated and the new version of the analysis routines creates output files that are nothing like the old. I re-wrote the scripts to accommodate this change, adding some new things as well. I tried this script over some example input files and it worked great, though when feeding the script some 75 input files I got the following error:
Code:
/usr/bin/awk: outputFile.dat makes too many open files
input record number 1, file inputFile.dat
source line number 33
I figure that something that should be closed is not closed and some limit is reached. What I cannot figure out is what. To make thinks more frustrating, I am using AWK on a Mac and I know that the system version is “weird” some times, so to accommodate the cluster where simulations are run I also wrote the “exact same script” for GAWK. The GAWK script, however, does not produce the same error when run on the same Mac with the same input, in fact it works perfectly. I’ve been staring at this for a while now, not able to find the issue. The older AWK script for the older format of output still works flawlessly with the same number of old input files using the Mac version of AWK. I really can’t figure this out, especially since the 33rd line is a printf statement to an output file (out2) that is supposed to be closed at the end of each cycle… Any help regarding this would be greatly appreciated!
This is the new script that fails (MacOS AWK):
Code:
#!/usr/bin/awk -f
BEGIN { t="/dev/tty";printf "Enter number of reference molecules: ">t;getline<t;ref_mol=$1 }
FNR==1 && NR!=1 { endfile(); frac=avgDist=avgDistsq=avgAng=avgAngsq=denom=0 }
FNR==1 { out1="analFrac_"FILENAME; out2="sumFrac_"FILENAME; out3="reportFrac.txt"; print "-> Input file is: "FILENAME >> out3;
print "-> Number of reference molecules: "ref_mol >> out3; next
}
FNR==1 { next }
{
gsub (/\./, "\,")
}
{
frac+=$5; avgDist+=$6; avgAng+=$7; ++denom;
printf "%10.4f %10.4f %10.4f\n",$5,$6,$7 > out1
}
END { endfile() }
function endfile()
{
x="\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
if (frac==0 && denom==0) {
print x > out1; print x > out2; print x"\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3;
close(out1); close(out2); close(out3); return
}
close(out1)
if (frac>0) {
avgAvgDist=avgDist/denom
avgAvgAng=avgAng/denom
while ((getline<out1)>0) {
avgDistsq+=(($2-avgAvgDist)^2)
avgAngsq+=(($3-avgAvgAng)^2)
}
avgFrac=frac/ref_mol
printf "\n Summary data for hbond analysis:\n\n" > out2
printf " Summed Fraction : %10.4f\n",frac > out2
printf " Averaged Fraction: %10.4f\n\n",avgFrac > out2
printf " Summed Avg Distance: %10.4f\n",avgDist > out2
printf " Average Distance: %10.4f\n\n",avgAvgDist > out2
printf " Summed Avg Angle: %10.4f\n",avgAng > out2
printf " Average Angle: %10.4f\n",avgAvgAng > out2
printf "\n Summary data for hbond analysis:\n\n" >> out3
printf " Summed Frac Occupied: %10.4f\n",frac >> out3
printf " Averaged Frac Occupied: %10.4f\n\n",avgFrac >> out3
printf " Summed Avg Distance: %10.4f\n",avgDist >> out3
printf " Average Distance: %10.4f\n\n",avgAvgDist >> out3
printf " Summed Avg Angle: %10.4f\n",avgAng >> out3
printf " Average Angle: %10.4f\n",avgAvgAng >> out3
if (denom>1) {
sd_avgDist=sqrt(avgDistsq/(denom-1)); semAvgDist=(sd_avgDist/(sqrt(denom))); sd_avgAng=sqrt(avgAngsq/(denom-1)); semAvgAng=(sd_avgAng/(sqrt(denom)));
print "\n<--------------STATISTICS-------------->\n" > out2
printf "\n\n SD Distance: %10.4f\n",sd_avgDist > out2
printf " SEM Distance: %10.4f\n\n",semAvgDist > out2
printf " SD Angle: %10.4f\n",sd_avgAng > out2
printf " SEM Angle: %10.4f\n",semAvgAng > out2
print "\n<--------------STATISTICS-------------->\n" >> out3
printf " SD Distance: %10.4f\n",sd_avgDist >> out3
printf " SEM Distance: %10.4f\n\n",semAvgDist >> out3
printf " SD Angle: %10.4f\n",sd_avgAng >> out3
printf " SEM Angle: %10.4f\n\n",semAvgAng >> out3
} if (denom>1 && denom!=2) {print "##---NEXTFILE--->> -------->> -------->>\n" >> out3 }
if (denom==1) { print " Single HBOND event, no SD or SEM calculations possible!" > out2;
print "\n Single HBOND event, no SD or SEM calculations possible!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
}
if (denom==2) { print "\n 2 Hydrogen bond events found! No proper SD or SEM!" > out2;
print " 2 Hydrogen bond events found! No proper SD or SEM!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
}
}
close(out3)
close(out2)
}
This is the new script that works (GAWK):
Code:
#!/opt/local/bin/gawk -f
BEGIN { t="/dev/tty";printf "Enter number of reference molecules: ">t;getline<t;ref_mol=$1 }
FNR==1 && NR!=1 { endfile(); frac=avgDist=avgDistsq=avgAng=avgAngsq=denom=0 }
FNR==1 { out1="analFrac_"FILENAME; out2="sumFrac_"FILENAME; out3="reportFrac.txt"; print "-> Input file is: "FILENAME >> out3;
print "-> Number of reference molecules: "ref_mol >> out3; next
}
FNR==1 { next }
{
frac+=$5; avgDist+=$6; avgAng+=$7; ++denom;
printf "%10.4f %10.4f %10.4f\n",$5,$6,$7 > out1
}
END { endfile() }
function endfile()
{
x="\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
if (frac==0 && denom==0) {
print x > out1; print x > out2; print x"\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3;
close(out1); close(out2); close(out3); return
}
close(out1)
if (frac>0) {
avgAvgDist=avgDist/denom
avgAvgAng=avgAng/denom
while ((getline<out1)>0) {
avgDistsq+=(($2-avgAvgDist)^2)
avgAngsq+=(($3-avgAvgAng)^2)
}
avgFrac=frac/ref_mol
printf "\n Summary data for hbond analysis:\n\n" > out2
printf " Summed Fraction : %10.4f\n",frac > out2
printf " Averaged Fraction: %10.4f\n\n",avgFrac > out2
printf " Summed Avg Distance: %10.4f\n",avgDist > out2
printf " Average Distance: %10.4f\n\n",avgAvgDist > out2
printf " Summed Avg Angle: %10.4f\n",avgAng > out2
printf " Average Angle: %10.4f\n",avgAvgAng > out2
printf "\n Summary data for hbond analysis:\n\n" >> out3
printf " Summed Frac Occupied: %10.4f\n",frac >> out3
printf " Averaged Frac Occupied: %10.4f\n\n",avgFrac >> out3
printf " Summed Avg Distance: %10.4f\n",avgDist >> out3
printf " Average Distance: %10.4f\n\n",avgAvgDist >> out3
printf " Summed Avg Angle: %10.4f\n",avgAng >> out3
printf " Average Angle: %10.4f\n",avgAvgAng >> out3
if (denom>1) {
sd_avgDist=sqrt(avgDistsq/(denom-1)); semAvgDist=(sd_avgDist/(sqrt(denom))); sd_avgAng=sqrt(avgAngsq/(denom-1)); semAvgAng=(sd_avgAng/(sqrt(denom)));
print "\n<--------------STATISTICS-------------->\n" > out2
printf "\n\n SD Distance: %10.4f\n",sd_avgDist > out2
printf " SEM Distance: %10.4f\n\n",semAvgDist > out2
printf " SD Angle: %10.4f\n",sd_avgAng > out2
printf " SEM Angle: %10.4f\n",semAvgAng > out2
print "\n<--------------STATISTICS-------------->\n" >> out3
printf " SD Distance: %10.4f\n",sd_avgDist >> out3
printf " SEM Distance: %10.4f\n\n",semAvgDist >> out3
printf " SD Angle: %10.4f\n",sd_avgAng >> out3
printf " SEM Angle: %10.4f\n\n",semAvgAng >> out3
} if (denom>1 && denom!=2) {print "##---NEXTFILE--->> -------->> -------->>\n" >> out3 }
if (denom==1) { print " Single HBOND event, no SD or SEM calculations possible!" > out2;
print "\n Single HBOND event, no SD or SEM calculations possible!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
}
if (denom==2) { print "\n 2 Hydrogen bond events found! No proper SD or SEM!" > out2;
print " 2 Hydrogen bond events found! No proper SD or SEM!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
}
}
close(out3)
close(out2)
}
This is the “old” script that also works on +50 files(MacOS AWK):
Code:
BEGIN { t="/dev/tty";printf "Enter number of reference molecules: ">t;getline<t;ref_mol=$1 }
FNR==1 && NR!=1 { endfile(); tott=tothb=tottsq=denom=denomx=0 }
FNR==1 { out1="analhbout_"FILENAME; out2="summary_"FILENAME; out3="fullreport.txt"; print "-> Input file is: "FILENAME >> out3;
print "-> Number of reference molecules: "ref_mol >> out3; next
}
FNR==2 { nr=(NF>8?13:9) }
FNR<=nr { next }
{
gsub (/\(+|\)+|\:/," ")
gsub (/\./,",")
}
{
if (NF>=15) {
tott+=$15; ++denom; tothb+=$10
printf "%10.2f %10.1f\n",$10,$15 > out1
} else if (NF>=11) {
tothb+=$10; ++denomx;
printf "%10.2f\n",$10 > out1
}
}
END { endfile() }
function endfile()
{
x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
if (denom==0 && tott>0) {
print x > out1; print x > out2; print x"\n\n----------------------------------------\n" >> out3;
close(out1); close(out2); close(out3); return
}else if (denomx==0 && tott==0) {
print x > out1; print x > out2; print x"\n\n----------------------------------------\n" >> out3;
close(out1); close(out2); close(out3); return
}
close(out1)
if (tott>0) {
avglt=tott/denom
while ((getline<out1)>0) tottsq+=(($2-avglt)^2)
avocc=tothb/ref_mol
printf " Summary data for hbond analysis: SERIES\n\n" > out2
printf " Sum of Occupancy: %10.3f\n",tothb > out2
printf " Average Occupancy: %10.3f\n\n",avocc > out2
printf " Sum of lifetimes: %10.3f\n",tott > out2
printf " Average lifetime: %10.3f\n",avglt > out2
printf " Summary data for hbond analysis: SERIES\n\n" >> out3
printf " Sum of Occupancy: %10.3f\n",tothb >> out3
printf " Average Occupancy: %10.3f\n\n",avocc >> out3
printf " Sum of lifetimes: %10.3f\n",tott >> out3
printf " Average lifetime: %10.3f\n",avglt >> out3
if (denom>1) {
sd_lt=sqrt(tottsq/(denom-1)); semlt=(sd_lt/(sqrt(denom))) #semlt=sqrt(tottsq/(denom-1))/(sqrt(denom))
printf " SD lifetime: %10.3f\n",sd_lt > out2
printf " SEM lifetime: %10.3f\n",semlt > out2
printf " SD lifetime: %10.3f\n",sd_lt >> out3
printf " SEM lifetime: %10.3f\n\n",semlt >> out3
} if (denom>1 && denom!=2) {print "----------------------------------------\n" >> out3 }
if (denom==1) { print " Single HBOND event, no SD or SEM calculation possible!" > out2;
print " Single HBOND event, no SD or SEM calculation possible!\n\n----------------------------------------\n" >> out3
}
if (denom==2) { print "\n 2 Hydrogen bond events found! No proper SD or SEM!" > out2;
print "\n 2 Hyrdogen bond events found! No proper SD or SEM!\n\n----------------------------------------\n" >> out3
}
}
close(out1)
if (tott==0) {
avocc=tothb/ref_mol
printf " Summary data for hbond analysis: NO SERIES\n\n" > out2
printf " Sum of Occupancy: %10.3f\n",tothb > out2
printf " Average Occupancy: %10.3f\n\n",avocc > out2
printf " Summary data for hbond analysis: NO SERIES\n\n" >> out3
printf " Sum of Occupancy: %10.3f\n",tothb >> out3
printf " Average Occupancy: %10.3f\n\n",avocc >> out3
if (denomx>1 && denomx!=2) {print "----------------------------------------\n" >> out3 }
if (denomx==1) { print " Single HBOND event detected!!!" > out2; print " Single HBOND event detected!!\n\n----------------------------------------\n" >> out3 }
if (denomx==2) { print " 2 Hydrogen bond events found!" > out2; print " 2 Hydrogen bond events found!\n\n----------------------------------------\n" >> out3 }
}
close(out3)
close(out2)
}