Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help solving a "makes too many open files" error 1

Status
Not open for further replies.

GusGrave

Programmer
Nov 17, 2010
41
SE
Hi all!

I have a problem with an AWK-script that I wrote a few years ago with a lot of help from people in this forum.

It’s basically pretty simple, it is supposed to extract some columns from output obtained from analyses of simulations. After obtaining the interesting columns from the input, some elementary calculations are suposed to be performed and then the results presented in a human-readable output. This is cycled over several input files, creating a single file with the relevant data as well as some “control” files that I need to save for documentation.

Now I have run into some problems that I cannot figure out. The software I use for the simulations were recently updated and the new version of the analysis routines creates output files that are nothing like the old. I re-wrote the scripts to accommodate this change, adding some new things as well. I tried this script over some example input files and it worked great, though when feeding the script some 75 input files I got the following error:

Code:
/usr/bin/awk: outputFile.dat makes too many open files
 input record number 1, file inputFile.dat
 source line number 33

I figure that something that should be closed is not closed and some limit is reached. What I cannot figure out is what. To make thinks more frustrating, I am using AWK on a Mac and I know that the system version is “weird” some times, so to accommodate the cluster where simulations are run I also wrote the “exact same script” for GAWK. The GAWK script, however, does not produce the same error when run on the same Mac with the same input, in fact it works perfectly. I’ve been staring at this for a while now, not able to find the issue. The older AWK script for the older format of output still works flawlessly with the same number of old input files using the Mac version of AWK. I really can’t figure this out, especially since the 33rd line is a printf statement to an output file (out2) that is supposed to be closed at the end of each cycle… Any help regarding this would be greatly appreciated!

This is the new script that fails (MacOS AWK):
Code:
#!/usr/bin/awk -f
BEGIN { t="/dev/tty";printf "Enter number of reference molecules: ">t;getline<t;ref_mol=$1 }
FNR==1 && NR!=1 { endfile(); frac=avgDist=avgDistsq=avgAng=avgAngsq=denom=0 }
FNR==1 { out1="analFrac_"FILENAME; out2="sumFrac_"FILENAME; out3="reportFrac.txt"; print "-> Input file is: "FILENAME >> out3;
         print "-> Number of reference molecules: "ref_mol >> out3; next
       }
FNR==1 { next }
{
   gsub (/\./, "\,")
}
{
   frac+=$5; avgDist+=$6; avgAng+=$7; ++denom;
   printf "%10.4f %10.4f %10.4f\n",$5,$6,$7 > out1
  }

END { endfile() }
function endfile()
{
 x="\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
  if (frac==0 && denom==0) {
    print x > out1; print x > out2; print x"\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3;
    close(out1); close(out2); close(out3); return
 }
  close(out1)
  if (frac>0) {
    avgAvgDist=avgDist/denom
    avgAvgAng=avgAng/denom
    while ((getline<out1)>0) {
        avgDistsq+=(($2-avgAvgDist)^2)
        avgAngsq+=(($3-avgAvgAng)^2)
    }
    avgFrac=frac/ref_mol
    printf "\n   Summary data for hbond analysis:\n\n" > out2
    printf "   Summed Fraction :       %10.4f\n",frac > out2
    printf "   Averaged Fraction:      %10.4f\n\n",avgFrac > out2
    printf "   Summed Avg Distance:    %10.4f\n",avgDist > out2
    printf "   Average Distance:       %10.4f\n\n",avgAvgDist > out2
    printf "   Summed Avg Angle:       %10.4f\n",avgAng > out2
    printf "   Average Angle:          %10.4f\n",avgAvgAng > out2
    printf "\n   Summary data for hbond analysis:\n\n" >> out3
    printf "   Summed Frac Occupied:   %10.4f\n",frac >> out3
    printf "   Averaged Frac Occupied: %10.4f\n\n",avgFrac >> out3
    printf "   Summed Avg Distance:    %10.4f\n",avgDist >> out3
    printf "   Average Distance:       %10.4f\n\n",avgAvgDist >> out3
    printf "   Summed Avg Angle:       %10.4f\n",avgAng >> out3
    printf "   Average Angle:          %10.4f\n",avgAvgAng >> out3
    if (denom>1) {
      sd_avgDist=sqrt(avgDistsq/(denom-1)); semAvgDist=(sd_avgDist/(sqrt(denom))); sd_avgAng=sqrt(avgAngsq/(denom-1)); semAvgAng=(sd_avgAng/(sqrt(denom)));
      print "\n<--------------STATISTICS-------------->\n" > out2
      printf "\n\n   SD Distance:            %10.4f\n",sd_avgDist > out2
      printf "   SEM Distance:           %10.4f\n\n",semAvgDist > out2
      printf "   SD Angle:               %10.4f\n",sd_avgAng > out2
      printf "   SEM Angle:              %10.4f\n",semAvgAng > out2
      print "\n<--------------STATISTICS-------------->\n" >> out3
      printf "   SD Distance:            %10.4f\n",sd_avgDist >> out3
      printf "   SEM Distance:           %10.4f\n\n",semAvgDist >> out3
      printf "   SD Angle:               %10.4f\n",sd_avgAng >> out3
      printf "   SEM Angle:              %10.4f\n\n",semAvgAng >> out3
    } if (denom>1 && denom!=2) {print "##---NEXTFILE--->> -------->> -------->>\n" >> out3 }
      if (denom==1) { print "   Single HBOND event, no SD or SEM calculations possible!" > out2;
             print "\n   Single HBOND event, no SD or SEM calculations possible!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
           }
      if (denom==2) { print "\n   2 Hydrogen bond events found! No proper SD or SEM!" > out2;
             print "   2 Hydrogen bond events found! No proper SD or SEM!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
           }
}
  close(out3)
  close(out2)
}

This is the new script that works (GAWK):
Code:
#!/opt/local/bin/gawk -f
BEGIN { t="/dev/tty";printf "Enter number of reference molecules: ">t;getline<t;ref_mol=$1 }
FNR==1 && NR!=1 { endfile(); frac=avgDist=avgDistsq=avgAng=avgAngsq=denom=0 }
FNR==1 { out1="analFrac_"FILENAME; out2="sumFrac_"FILENAME; out3="reportFrac.txt"; print "-> Input file is: "FILENAME >> out3;
         print "-> Number of reference molecules: "ref_mol >> out3; next
       }
FNR==1 { next }
{
   frac+=$5; avgDist+=$6; avgAng+=$7; ++denom;
   printf "%10.4f %10.4f %10.4f\n",$5,$6,$7 > out1
  }

END { endfile() }
function endfile()
{
 x="\nNO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
  if (frac==0 && denom==0) {
    print x > out1; print x > out2; print x"\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3;
    close(out1); close(out2); close(out3); return
 }
  close(out1)
  if (frac>0) {
    avgAvgDist=avgDist/denom
    avgAvgAng=avgAng/denom
    while ((getline<out1)>0) {
        avgDistsq+=(($2-avgAvgDist)^2)
        avgAngsq+=(($3-avgAvgAng)^2)
    }
    avgFrac=frac/ref_mol
    printf "\n   Summary data for hbond analysis:\n\n" > out2
    printf "   Summed Fraction :       %10.4f\n",frac > out2
    printf "   Averaged Fraction:      %10.4f\n\n",avgFrac > out2
    printf "   Summed Avg Distance:    %10.4f\n",avgDist > out2
    printf "   Average Distance:       %10.4f\n\n",avgAvgDist > out2
    printf "   Summed Avg Angle:       %10.4f\n",avgAng > out2
    printf "   Average Angle:          %10.4f\n",avgAvgAng > out2
    printf "\n   Summary data for hbond analysis:\n\n" >> out3
    printf "   Summed Frac Occupied:   %10.4f\n",frac >> out3
    printf "   Averaged Frac Occupied: %10.4f\n\n",avgFrac >> out3
    printf "   Summed Avg Distance:    %10.4f\n",avgDist >> out3
    printf "   Average Distance:       %10.4f\n\n",avgAvgDist >> out3
    printf "   Summed Avg Angle:       %10.4f\n",avgAng >> out3
    printf "   Average Angle:          %10.4f\n",avgAvgAng >> out3
    if (denom>1) {
      sd_avgDist=sqrt(avgDistsq/(denom-1)); semAvgDist=(sd_avgDist/(sqrt(denom))); sd_avgAng=sqrt(avgAngsq/(denom-1)); semAvgAng=(sd_avgAng/(sqrt(denom)));
      print "\n<--------------STATISTICS-------------->\n" > out2
      printf "\n\n   SD Distance:            %10.4f\n",sd_avgDist > out2
      printf "   SEM Distance:           %10.4f\n\n",semAvgDist > out2
      printf "   SD Angle:               %10.4f\n",sd_avgAng > out2
      printf "   SEM Angle:              %10.4f\n",semAvgAng > out2
      print "\n<--------------STATISTICS-------------->\n" >> out3
      printf "   SD Distance:            %10.4f\n",sd_avgDist >> out3
      printf "   SEM Distance:           %10.4f\n\n",semAvgDist >> out3
      printf "   SD Angle:               %10.4f\n",sd_avgAng >> out3
      printf "   SEM Angle:              %10.4f\n\n",semAvgAng >> out3
    } if (denom>1 && denom!=2) {print "##---NEXTFILE--->> -------->> -------->>\n" >> out3 }
      if (denom==1) { print "   Single HBOND event, no SD or SEM calculations possible!" > out2;
             print "\n   Single HBOND event, no SD or SEM calculations possible!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
           }
      if (denom==2) { print "\n   2 Hydrogen bond events found! No proper SD or SEM!" > out2;
             print "   2 Hydrogen bond events found! No proper SD or SEM!\n\n##---NEXTFILE--->> -------->> -------->>\n" >> out3
           }
}
  close(out3)
  close(out2)
}

This is the “old” script that also works on +50 files(MacOS AWK):
Code:
BEGIN { t="/dev/tty";printf "Enter number of reference molecules: ">t;getline<t;ref_mol=$1 }
FNR==1 && NR!=1 { endfile(); tott=tothb=tottsq=denom=denomx=0 }
FNR==1 { out1="analhbout_"FILENAME; out2="summary_"FILENAME; out3="fullreport.txt"; print "-> Input file is: "FILENAME >> out3;
         print "-> Number of reference molecules: "ref_mol >> out3; next
       }
FNR==2 { nr=(NF>8?13:9) }
FNR<=nr { next }
{
  gsub (/\(+|\)+|\:/," ")
  gsub (/\./,",")
}
{
  if (NF>=15) {
    tott+=$15; ++denom; tothb+=$10
    printf "%10.2f %10.1f\n",$10,$15 > out1
  } else if (NF>=11) {
    tothb+=$10; ++denomx;
    printf "%10.2f\n",$10 > out1
  }
}
END { endfile() }
function endfile()
{
 x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
  if (denom==0 && tott>0) {
    print x > out1; print x > out2; print x"\n\n----------------------------------------\n" >> out3;
    close(out1); close(out2); close(out3); return
 }else if (denomx==0 && tott==0) {
    print x > out1; print x > out2; print x"\n\n----------------------------------------\n" >> out3;
    close(out1); close(out2); close(out3); return
 }
  close(out1)
  if (tott>0) {
    avglt=tott/denom
    while ((getline<out1)>0) tottsq+=(($2-avglt)^2)
    avocc=tothb/ref_mol
    printf "   Summary data for hbond analysis: SERIES\n\n" > out2
    printf "   Sum of Occupancy:      %10.3f\n",tothb > out2
    printf "   Average Occupancy:     %10.3f\n\n",avocc > out2
    printf "   Sum of lifetimes:      %10.3f\n",tott > out2
    printf "   Average lifetime:      %10.3f\n",avglt > out2
    printf "   Summary data for hbond analysis: SERIES\n\n" >> out3
    printf "   Sum of Occupancy:      %10.3f\n",tothb >> out3
    printf "   Average Occupancy:     %10.3f\n\n",avocc >> out3
    printf "   Sum of lifetimes:      %10.3f\n",tott >> out3
    printf "   Average lifetime:      %10.3f\n",avglt >> out3
    if (denom>1) {
      sd_lt=sqrt(tottsq/(denom-1)); semlt=(sd_lt/(sqrt(denom)))   #semlt=sqrt(tottsq/(denom-1))/(sqrt(denom))
      printf "   SD lifetime:           %10.3f\n",sd_lt > out2
      printf "   SEM lifetime:          %10.3f\n",semlt > out2
      printf "   SD lifetime:           %10.3f\n",sd_lt >> out3
      printf "   SEM lifetime:          %10.3f\n\n",semlt >> out3
    } if (denom>1 && denom!=2) {print "----------------------------------------\n" >> out3 }
      if (denom==1) { print "   Single HBOND event, no SD or SEM calculation possible!" > out2;
             print "   Single HBOND event, no SD or SEM calculation possible!\n\n----------------------------------------\n" >> out3
           }
      if (denom==2) { print "\n   2 Hydrogen bond events found! No proper SD or SEM!" > out2;
             print "\n   2 Hyrdogen bond events found! No proper SD or SEM!\n\n----------------------------------------\n" >> out3
           }
}
  close(out1)
  if (tott==0) {
    avocc=tothb/ref_mol
    printf "   Summary data for hbond analysis: NO SERIES\n\n" > out2
    printf "   Sum of Occupancy:      %10.3f\n",tothb > out2
    printf "   Average Occupancy:     %10.3f\n\n",avocc > out2
    printf "   Summary data for hbond analysis: NO SERIES\n\n" >> out3
    printf "   Sum of Occupancy:      %10.3f\n",tothb >> out3
    printf "   Average Occupancy:     %10.3f\n\n",avocc >> out3
    if (denomx>1 && denomx!=2) {print "----------------------------------------\n" >> out3 }
    if (denomx==1) { print "   Single HBOND event detected!!!" > out2; print "   Single HBOND event detected!!\n\n----------------------------------------\n" >> out3 }
    if (denomx==2) { print "   2 Hydrogen bond events found!" > out2; print "   2 Hydrogen bond events found!\n\n----------------------------------------\n" >> out3 }
 }
  close(out3)
  close(out2)
}
 
out1 should be closed AFTER the getline<out1

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
It did, thanks for this. Though this raises another question seeing as when I run the "new" modified working script with Mac AWK, the results are exactly the same as with the GAWK version (except for . and ,). Though if I change the GAWK version to place close(out1) after the while function:

Code:
   while ((getline<out1)>0) {
        avgDistsq+=(($2-avgAvgDist)^2)
        avgAngsq+=(($3-avgAvgAng)^2)
    }
  close(out1)

The calculations of avgDistsq and avgAngsq now differs from the Mac version. I have to perform the corresponding calculations manually and make sure which is correct, though is there a particular reason for the observed behavior?

Best regards
 
I did the calculations and by putting close(out1) after the while function the calculations are correct in the Mac version. Though by doing the same in the GAWK version, this works but produces values for the calculations in the while function that are not correct. Any guess as to why this is?
 
Well, I suspect that this is one of those "if it ain't broke, don't fix it" moments. I always find it hard to tell but I suppose this could be my OCD is acting up, I just can't leave this alone.

In the GAWK version of the script, this is where things get funky
Code:
close(out1)
  if (frac>0) {
    avgAvgDist=avgDist/denom
    avgAvgAng=avgAng/denom
    while ((getline<out1)>0) {
        avgDistsq+=(($2-avgAvgDist)^2)
        avgAngsq+=(($3-avgAvgAng)^2)
    }

By moving the 'close' to after the while statement the Mac AWK now works, though I player around with the GAWK version a bit and now I am even more confused. By putting close(out1) after the getline (whole while loop) the avgDistsq and avgAngsq are no longer correctly calculated (optionA). I can put close(out1) inside the if statement, but that is basically the same as "as is". If i comment the close(out1) out, the results are the same as in optionA. So, for GAWK to work correctly I have to close out1 before running (getline<out1) (?????) while in native OS AWK, I have to keep out1 open not to have to many open files, though the placement has no effect on the calculations whatsoever.

I know I am missin something here, probably something elementary, though at this point I can only see the trees. If someone has some thoughts on this it would be greatly appreciated since I'm having some real issues not trying to continue fixing the not broken code here.

Best regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top