Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AWK and multiple intputs? 2

Status
Not open for further replies.

GusGrave

Programmer
Nov 17, 2010
41
SE
Back again! Finally the script seems to be working without a glitch and my coworkers seems pleased!

Thank you all for the help with this one! But, as always with efficiency, I want to be more efficient. I am also more lazy than the average person ;). So my question is this, in the modified version of this script below, where manual input for each analysis has been removed, is there a way to get AWK to spit out one "out1" and one "out2" file for each input if I run the script as:

./script.awk hb_results*

Using a wildcard to input all files at once? As it is now, I get one out1 and out2 named after the first file specified with the * AWK finds, though these two outputs contain the results from all the input files. I would like one out1 and one out2 for each file found with hb_results* (up to 30 or 40 files), named after each specific input found as specified in the script!

Is this possible?

Best regards
Gustaf


BEGIN{inp_num=10}
NR==1{out1="hb_%_occ_"FILENAME;out2="summary_"FILENAME;next}
NR==2{nr=(NF>8?12:8)}
NR<=nr{next}
{
gsub (/\(+|\)+|\:/," ")
}
{
if(NF>=15){
tott+=$15;++denom;tothb+=$10
printf "%10.2f %10.1f\n",$10,$15>out1
}


else if(NF>=11){
tothb+=$10;++denom
printf "%10.2f\n",$10>out1
}
}
END{
if(denom==0){
x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
print x>out1;print x>out2;exit
}
close(out1);
if(tott>0){
avglt=tott/denom
while((getline<out1)>0)tottsq+=(($2-avglt)^2)
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.3f\n",tothb>out2
printf " Average Occupancy: %10.3f\n\n",avocc>out2
printf " Sum of lifetimes: %10.3f\n",tott>out2
printf " Average lifetime: %10.3f\n",avglt>out2
if(denom>1){
sd_lt=sqrt(tottsq/(denom-1));semlt=(tottsq/(denom-1))/(sqrt(denom))
printf " SD lifetime: %10.3f\n",sd_lt>out2
printf " SEM lifetime: %10.3f\n",semlt>out2
} else print " Single HBOND event, no SD or SEM calculation possible!">out2
}

if (tott==0){
avocc=tothb/inp_num
printf " Summary data for hbond analysis\n\n">out2
printf " Sum of Occupancy: %10.3f\n",tothb>out2
printf " Average Occupancy: %10.3f\n\n",avocc>out2
if(denom<1){ print " Single HBOND event, no SD or SEM calculation possible!">out2 }
}
}
 
Sorry for the somewhat dumb question, you already specified the "reset" for me, just didn't catch that one.

This is what it looks like now, and it seems to be working fine!

Code:
BEGIN { ref_mol=10 }
FNR==1 && NR!=1 { endfile(); tott=tothb=tottsq=denom=0 }
FNR==1 { out1="hb_%_occ_"FILENAME; out2="summary_"FILENAME; next }
FNR==2 { nr=(NF>8?12:8) }
FNR<=nr { next }
{
  gsub (/\(+|\)+|\:/," ")
}
{
  if (NF>=15) {
    tott+=$15; ++denom; tothb+=$10
    printf "%10.2f %10.1f\n",$10,$15 > out1
  } else if (NF>=11) {
    tothb+=$10; ++denom
    printf "%10.2f\n",$10 > out1
  }
}
END { endfile() }
function endfile()
{
  if (denom==0) {
    x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
    print x > out1; print x > out2;
    close(out1); close(out2);
    return
 }
  close(out1)
  if (tott>0) {
    avglt=tott/denom
    while ((getline<out1)>0) tottsq+=(($2-avglt)^2)             
    avocc=tothb/ref_mol
    printf "   Summary data for hbond analysis\n\n" > out2
    printf "   Sum of Occupancy:      %10.3f\n",tothb > out2
    printf "   Average Occupancy:     %10.3f\n\n",avocc > out2
    printf "   Sum of lifetimes:      %10.3f\n",tott > out2
    printf "   Average lifetime:      %10.3f\n",avglt > out2
    if (denom>1) {
      sd_lt=sqrt(tottsq/(denom-1)); semlt=(tottsq/(denom-1))/(sqrt(denom))
      printf "   SD lifetime:           %10.3f\n",sd_lt > out2
      printf "   SEM lifetime:          %10.3f\n",semlt > out2
    } else print "   Single HBOND event, no SD or SEM calculation possible!" > out2
 }
  close(out1)
  if (tott==0) {
    avocc=tothb/ref_mol
    printf "   Summary data for hbond analysis\n\n" > out2
    printf "   Sum of Occupancy:      %10.3f\n",tothb > out2
    printf "   Average Occupancy:     %10.3f\n\n",avocc > out2
    if (denom<1) { print "   Single HBOND event, no SD or SEM calculation possible!" > out2 }
 }
  close(out2)
}

Forgot to remove the 'while (getline<out1)>0)' that no-longer seems to fill any purpose, bot other than some "cleanup" it seems to be working alright. Though I'm sure something else will come up sooner or later... Some more testing and then out to the general public (my 3 co-workers!)

Thank you so much for your help!

Best regards
// Gustaf
 
Back again!

I was trying to clean up the script from things I thought was unnecessary, and I ran into a problem.

By removing

Code:
while ((getline<out1)>0)

The

Code:
    if (denom>1) {
      sd_lt=sqrt(tottsq/(denom-1)); semlt=(tottsq/(denom-1))/(sqrt(denom))
      printf "   SD lifetime:           %10.3f\n",sd_lt > out2
      printf "   SEM lifetime:          %10.3f\n",semlt > out2
    } else print "   Single HBOND event, no SD or SEM calculation possible!" > out2
 }

get distorted again. Only these two calculations are affected. It seems (based on previously observed errors) that the ttotsq is no longer being reset.

Any ideas on how to safely remove the "while ... getline.." function, which I dont think I need in this script, without disrupting the calculations?

Best regards
// Gustaf
 
I dont think I need in this script
But sure you need it, unless you don't care the values derived from tottsq ...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Sorry, I guess my poor "computer speak" is causing me more problems, as usual!

Obviously I should be more suspicious since it doesn't work without the while function. But, to highlight my lacking skills, what is "getline"actually extracting from out1 that needs to be larger than 0? And should it then not be a problem if the "close(out1)" occurs prior to the while (geline..)?

Best regards
// Gustaf
 
Hi

In out1's $2 you have the input file's $15. But the formula also uses tott which is ? $15 :
[tt]
tottsq = ? ( $15 - ( ? $15 / denom ) ^ 2 )
[/tt]
I see no way to evaluate this in a single loop [sup](*)[/sup], as it both needs the individual values and their sum. The faster alternative would be to store the $15's in an array.

[small](*) Given my math skills this not means there is no way.[/small]

Gustaf said:
what is "getline"actually extracting from out1 that needs to be larger than 0?
It returns 0 on end of file. So repeating it until 0 is received means reading until nothing left to read.
Gustaf said:
And should it then not be a problem if the "close(out1)" occurs prior to the while (geline..)?
It is needed to reset the file pointer. The previous write operation left it at the end of file, so otherwise a read operation would have nothing to read.


Feherke.
 
I got it twisted! Just jumped to the conclusion that this had something to do with the shell-based getline in the previous files I had been working on! Thank you very much for clarifying!

Obviously, there is much room for improvement, I have checked and double check the output and the math actually works. I was looking into an array when I started, though as must have become obvious, I'm not that sure of what I'm doing and was only happy to see that it worked!

Guess it works since avglt becomes a numerical value/constant from out2, "separate" from out1 while the ttot also becomes a constant from out1 "separate" from out2, both stored as constants. I definetly see what you meen, this could be a problem, I guess its just my luck that it reads from two different files and saves constants under different "assignments", otherwise it would not work.

I'll look into the array part of AWK and see if I can work out more improvements of how to handle data. But at least I know that the "getline" is a central part of the code and should not be removed!!!

Tanks to both of you again!

// Gustaf

 
Hello again

So, one of my co-workers expressed a wish to have all the "out2" files, the "summary" files put into 1 file as a list. Of course I thought that this would be an even lazier approach to what I'm doing and therefore got very enthusiastic. So, I figured this would be very easy, and of course it was not...

Putting the calculated values into another file by appending to this file was very easy indeed. But, each summary for each input file needs an identifier, preferably in the form of the input-filename, for the list to make any sense. So I guess you probably see the issue; when running the "loop" (the script over and over again) for all the input files, the out1 and out2 is still created in a correct manner, and without the FILENAME included, all the summarized data gets appended to "out3", but when I try to include the "FILENAME" in out3, the file does not contain the first filename and the last filename gets printed twice.

I figure this has something to do with the placement of "FILENAME" in the script and the:
Code:
FNR==1 && NR!=1 { endfile(); tott=tothb=tottsq=denom=0 }

This is what I would like out3 to look like, but with "thisshouldbeFILENAME" replaced by the actual input filename for each file (and not what I get right now):

_______________________________________________

thisshouldbeFILENAME
Summary data for hbond analysis

Sum of Occupancy: 2.420
Average Occupancy: 0.242

Sum of lifetimes: 27.300
Average lifetime: 1.137
SD lifetime: 0.263
SEM lifetime: 0.014

thisshouldbeFILENAME
Summary data for hbond analysis

Sum of Occupancy: 9.760
Average Occupancy: 0.976

Sum of lifetimes: 51.100
Average lifetime: 1.825
SD lifetime: 0.808
SEM lifetime: 0.123

thisshouldbeFILENAME
Summary data for hbond analysis

Sum of Occupancy: 2.140
Average Occupancy: 0.214

Sum of lifetimes: 28.300
Average lifetime: 1.230
SD lifetime: 0.251
SEM lifetime: 0.013
_______________________________________________
and this is the code (of course I found more holes in the code that needed to be addressed so there are minor changes to the previous post) that I'm working with now:

Code:
BEGIN { ref_mol=10 }
FNR==1 && NR!=1 { endfile(); tott=tothb=tottsq=denom=0 }
FNR==1 { out1="analhbout_"FILENAME; out2="summary_"FILENAME; out3="allsum.txt"; next }
FNR==2 { nr=(NF>8?12:8) }
FNR<=nr { next }
{
  gsub (/\(+|\)+|\:/," ")
}
{
  if (NF>=15) {
    tott+=$15; ++denom; tothb+=$10
    printf "%10.2f %10.1f\n",$10,$15 > out1
  } else if (NF>=11) {
    tothb+=$10; ++denomx;
    printf "%10.2f\n",$10 > out1
  }
}
END { endfile() }
function endfile()
{
  if (denom==0 && tott>0) {
    x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
    print x > out1; print x > out2; print "thisshouldbeFILENAME\n" x"\n\n" >> out3;
    close(out1); close(out2); close(out3);
    return
 }
  if (denomx==0 && tott==0) {
    x="NO DATA POINTS IN INPUT => NO HYDROGEN BONDS DETECTED!"
    print x > out1; print x > out2; print "thisshouldbeFILENAME\n" x"\n\n" >> out3;
    close(out1); close(out2); close(out3);
    return
 }
  close(out1)
  if (tott>0) {
    avglt=tott/denom
    while ((getline<out1)>0) tottsq+=(($2-avglt)^2)		
    avocc=tothb/ref_mol
    printf "   Summary data for hbond analysis\n\n" > out2
    printf "   Sum of Occupancy:      %10.3f\n",tothb > out2
    printf "   Average Occupancy:     %10.3f\n\n",avocc > out2
    printf "   Sum of lifetimes:      %10.3f\n",tott > out2
    printf "   Average lifetime:      %10.3f\n",avglt > out2
    printf "thisshouldbeFILENAME\n" >> out3
    printf "   Summary data for hbond analysis\n\n" >> out3
    printf "   Sum of Occupancy:      %10.3f\n",tothb >> out3
    printf "   Average Occupancy:     %10.3f\n\n",avocc >> out3
    printf "   Sum of lifetimes:      %10.3f\n",tott >> out3
    printf "   Average lifetime:      %10.3f\n",avglt >> out3
    if (denom>1) {
      sd_lt=sqrt(tottsq/(denom-1)); semlt=(tottsq/(denom-1))/(sqrt(denom))
      printf "   SD lifetime:           %10.3f\n",sd_lt > out2
      printf "   SEM lifetime:          %10.3f\n",semlt > out2
      printf "   SD lifetime:           %10.3f\n",sd_lt >> out3
      printf "   SEM lifetime:          %10.3f\n\n",semlt >> out3
    } if (denom==1) { print "   Single HBOND event, no SD or SEM calculation possible!" > out2;
             print "   Single HBOND event, no SD or SEM calculation possible!\n\n" >> out3
           }
      if (denom==2) { print "   2 Hydrogen bond events found! No proper SD or SEM!" > out2;
             print "   2 Hyrdogen bond events found! No proper SD or SEM!\n\n" >> out3
           }
}
  close(out1)
  if (tott==0) {
    avocc=tothb/ref_mol
    printf "   Summary data for hbond analysis\n\n" > out2
    printf "   Sum of Occupancy:      %10.3f\n",tothb > out2
    printf "   Average Occupancy:     %10.3f\n\n",avocc > out2
    printf "thisshouldbeFILENAME\n" >> out3
    printf "   Summary data for hbond analysis\n\n" >> out3
    printf "   Sum of Occupancy:      %10.3f\n",tothb >> out3
    printf "   Average Occupancy:     %10.3f\n\n",avocc >> out3
    if (denomx==1) { print "   Single HBOND event detected!!" > out2; print "   Single HBOND event detected!!\n\n" >> out3 }
 }
  close(out3)
  close(out2)
}

I see that with the correct placement, all of the 'printf "thisshouldbeFILENAME\n" >> out3' can be replaced with 1 of these at a better placement in the file since it only needs to be printed once for each file in out3.

Any and all feedback is more than welcome!

Thanks again for all your help!
// Gus
 
FNR==1 { out1="analhbout_"FILENAME; out2="summary_"FILENAME; out3="allsum.txt"; [!]print FILENAME >> out3;[/!] next }

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
AWSOME!!!!!!

Man I feel like a dumba**!

So, now I should definitely have all I need to do some serious analysis of my simulations in no-time!

I really cannot thank you both enough, you have no idea how useful this is to me!!! (And everyone else riding piggyback on this script here).

I'll try to "shelve" the "programming" for a while so you don't get completely fed up with me, but I cannot stress my gratitude for you help enough! You are the best!

The warmest of regards and the greatest of gratitude!
//Gustaf
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top