I'm producing a long tabular text file by extracting information from a set of log files. I wanted to do some operations on the resulting tabular data and create a new text file with tabular data.
my tabular data looks like this
and the code producing it is as below:
I wanted to whenever columns: Compound, Method and Approach are a match energy values be reduced from each other exactly as Singlet-Triplet and form a new tabular data all together. for example
Form one row as
and of course if match is not found just a simple error match not found or data not available. your help is appreciated and thanks in advance
my tabular data looks like this
Code:
Compound State Method Approach S^2 Energy Path
C(CCH)2 singlet CC TO ERROR -> input issue or ? 3-1/C-CCH-2/C-CCH-2-CC-s.out
C(CCH)2 singlet CC TO 1.108791 -191.426232325854 3-1/C-CCH-2/C-CCH-2-s.out
C(CCH)2 triplet CC TO 2.235993 -191.434509836762 3-1/C-CCH-2/C-CCH-2-t.out
C(NH2)2 triplet DFT TO ERROR -> input issue or ? 3-1/C-NH2-2/C-NH2-2-t.out
C(NMe2)2 triplet DFT TO ERROR -> input issue or ? 3-1/C-NMe2-2/C-NMe2-2-t.out
C(SH)2 singlet CC TO ERROR -> input issue or ? 3-1/C-SH-2/C-SH-2-CC-s.out
C(SH)2 singlet DFT TO 0.000006 -835.261598037781 3-1/C-SH-2/C-SH-2-s.out
C(SH)2 triplet DFT TO 2.034097 -835.190581480918 3-1/C-SH-2/C-SH-2-t.out
C(SiH3)2 singlet CC TO ERROR -> SCF NOT CONVERGED 3-1/C-SiH3-2/C-SiH3-2-CC-s.out
C(SiH3)2 triplet CC TO ERROR -> input issue or ? 3-1/C-SiH3-2/C-SiH3-2-CC-t.out
C(SiH3)2 singlet DFT TO 0.000224 -620.339326760127! 3-1/C-SiH3-2/C-SiH3-2-s.out
C(SiH3)2 triplet DFT TO 2.013503 -620.379515709604 3-1/C-SiH3-2/C-SiH3-2-t.out
CF2 singlet CC TO 0.000000 -237.419131945340 3-1/CF2/CF2-CC-s.out
CF2 singlet DFT TO -0.000000 -237.686609290184 3-1/CF2/CF2-s.out
and the code producing it is as below:
Bash:
awk '
BEGIN {print "Compound\tState\t\tMethod\t\tApproach\tS^2\t\tEnergy\t\t\tPath"}'
find . -name '*.out' | while read FILENAME
do
awk '
FNR==1 {if (FILENAME ~ /-/)
{ sub("./","", FILENAME);m=split(FILENAME, Ti, "/")
n=split(Ti[m], T, "-")
if (length(T[1]) < 2 ) {T[1]=T[1]"("T[2]")"substr(T[3],1,1)}
printf("%-15.10s\t%-10s\t%-10s\t%-5s\t\t", T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", FILENAME~"-CC"?"CC":"DFT",FILENAME~"3-1"?"TO":"NONE");
FOUND=0
}
else
{sub("./","", FILENAME);m=split(FILENAME, Ti, "/")
n=split(Ti[m], T, ".")
if (length(T[1]) < 2 ) {T[1]=T[1]"("T[2]")"T[3]}
printf ("%-15.10s\t%-10s\t%-10s\t%-5s\t\t", T[1] , "Singlet", "DFT",FILENAME~"3-1"?"TO":"NONE ");
FOUND=0
}
}
/UHF/{OPS=1}
/UKS/{OPS=1}
!OPS &&
/xyz 0 1/ {MULT==1}
!OPS &&
/xyzfile 0 1/ {MULT==1;}
/The optimization did not converge but reached the maximum number of/ { OPT=1 }
/SCF NOT CONVERGED/ {PROB=1;
}
/An error has occured in the MDCI module/ { MDCI=1 }
/HURRAY/ {FOUND=1;
}
FOUND && !OPS &&
/THE OPTIMIZATION HAS CONVERGED/ {printf "%s\t","Restricted"}
FOUND &&
/SCF NOT CONVERGED AFTER/ {printf "%s\t","SCF Crash!"}
FOUND &&
/Expectation value of/ { printf ("%s\t",$6)
SS=1;}
FOUND &&
/^FINAL.*ERGY/ {
if (!PROB){ print $NF " \t" FILENAME
CONV=1}
else{print $NF "! \t" FILENAME
CONV=1}
}
END {if (!CONV && !SS){printf "%s\t","ERROR ->"}
if (!CONV && OPT==1) {print "NOT OPTIMIZED\t\t" FILENAME}
else if(!CONV && PROB==1) {print "SCF NOT CONVERGED\t" FILENAME}
else if(!CONV && MDCI==1) {print "MDCI MODULE ERROR\t" FILENAME}
else if(!CONV && !PROB && CONV!=1 && MDCI!=1){print "input issue or ?\t" FILENAME}
};
' OFS="\t" "$FILENAME"
done
I wanted to whenever columns: Compound, Method and Approach are a match energy values be reduced from each other exactly as Singlet-Triplet and form a new tabular data all together. for example
Code:
C(CCH)2 singlet CC TO 1.108791 -191.426232325854 3-1/C-CCH-2/C-CCH-2-s.out
C(CCH)2 triplet CC TO 2.235993 -191.434509836762 3-1/C-CCH-2/C-CCH-2-t.out
Form one row as
Code:
C(CCH)2 CC TO 0.008277510908
and of course if match is not found just a simple error match not found or data not available. your help is appreciated and thanks in advance