Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Tabular DATA manipulation

Status
Not open for further replies.

raymondgh

Programmer
Aug 31, 2015
1
DE
I'm producing a long tabular text file by extracting information from a set of log files. I wanted to do some operations on the resulting tabular data and create a new text file with tabular data.

my tabular data looks like this
Code:
Compound	State		Method		Approach	S^2		Energy			Path
C(CCH)2        	singlet   	CC        	TO   		ERROR   ->	input issue or ?	3-1/C-CCH-2/C-CCH-2-CC-s.out
C(CCH)2        	singlet   	CC        	TO   		1.108791	-191.426232325854 	3-1/C-CCH-2/C-CCH-2-s.out
C(CCH)2        	triplet   	CC        	TO   		2.235993	-191.434509836762 	3-1/C-CCH-2/C-CCH-2-t.out
C(NH2)2        	triplet   	DFT       	TO   		ERROR   ->	input issue or ?	3-1/C-NH2-2/C-NH2-2-t.out
C(NMe2)2       	triplet   	DFT       	TO   		ERROR   ->	input issue or ?	3-1/C-NMe2-2/C-NMe2-2-t.out
C(SH)2         	singlet   	CC        	TO   		ERROR   ->	input issue or ?	3-1/C-SH-2/C-SH-2-CC-s.out
C(SH)2         	singlet   	DFT       	TO   		0.000006	-835.261598037781 	3-1/C-SH-2/C-SH-2-s.out
C(SH)2         	triplet   	DFT       	TO   		2.034097	-835.190581480918 	3-1/C-SH-2/C-SH-2-t.out
C(SiH3)2       	singlet   	CC        	TO   		ERROR   ->	SCF NOT CONVERGED	3-1/C-SiH3-2/C-SiH3-2-CC-s.out
C(SiH3)2       	triplet   	CC        	TO   		ERROR   ->	input issue or ?	3-1/C-SiH3-2/C-SiH3-2-CC-t.out
C(SiH3)2       	singlet   	DFT       	TO   		0.000224	-620.339326760127! 	3-1/C-SiH3-2/C-SiH3-2-s.out
C(SiH3)2       	triplet   	DFT       	TO   		2.013503	-620.379515709604 	3-1/C-SiH3-2/C-SiH3-2-t.out
CF2            	singlet   	CC        	TO   		0.000000	-237.419131945340 	3-1/CF2/CF2-CC-s.out
CF2            	singlet   	DFT       	TO   		-0.000000	-237.686609290184 	3-1/CF2/CF2-s.out

and the code producing it is as below:

Bash:
awk '
BEGIN           {print "Compound\tState\t\tMethod\t\tApproach\tS^2\t\tEnergy\t\t\tPath"}'
find . -name '*.out' | while read FILENAME

do

awk '
FNR==1          {if (FILENAME ~ /-/) 
                  { sub("./","", FILENAME);m=split(FILENAME, Ti, "/") 
                                         n=split(Ti[m], T, "-")
                                         if (length(T[1]) < 2 ) {T[1]=T[1]"("T[2]")"substr(T[3],1,1)}
                                         printf("%-15.10s\t%-10s\t%-10s\t%-5s\t\t",  T[1], substr(T[n],1,1)=="t"?"triplet":"singlet", FILENAME~"-CC"?"CC":"DFT",FILENAME~"3-1"?"TO":"NONE");
                                         FOUND=0
                 }
                 else
                  {sub("./","", FILENAME);m=split(FILENAME, Ti, "/") 
                                         n=split(Ti[m], T, ".")
                                         if (length(T[1]) < 2 ) {T[1]=T[1]"("T[2]")"T[3]}
                                         printf ("%-15.10s\t%-10s\t%-10s\t%-5s\t\t", T[1] , "Singlet", "DFT",FILENAME~"3-1"?"TO":"NONE   ");
                                         FOUND=0
                }
                }


/UHF/{OPS=1}
/UKS/{OPS=1}

!OPS &&
/xyz 0 1/ {MULT==1}

!OPS &&
/xyzfile 0 1/ {MULT==1;}

/The optimization did not converge but reached the maximum number of/ { OPT=1 }
/SCF NOT CONVERGED/ {PROB=1;
                }  
/An error has occured in the MDCI module/ { MDCI=1 }   
/HURRAY/        {FOUND=1;
                }
FOUND && !OPS &&
/THE OPTIMIZATION HAS CONVERGED/ {printf "%s\t","Restricted"}

FOUND &&
/SCF NOT CONVERGED AFTER/ {printf "%s\t","SCF Crash!"}

FOUND &&
/Expectation value of/ { printf ("%s\t",$6)
                        SS=1;}            
FOUND &&
/^FINAL.*ERGY/  {
    if (!PROB){ print $NF " \t"  FILENAME 
                 CONV=1}
             else{print $NF "! \t" FILENAME
             CONV=1}
                }
END             {if (!CONV && !SS){printf "%s\t","ERROR   ->"}
    if (!CONV && OPT==1) {print "NOT OPTIMIZED\t\t" FILENAME}
else if(!CONV && PROB==1) {print "SCF NOT CONVERGED\t" FILENAME}
else if(!CONV && MDCI==1) {print "MDCI MODULE ERROR\t" FILENAME}
else if(!CONV && !PROB && CONV!=1 && MDCI!=1){print "input issue or ?\t" FILENAME} 
                };       
' OFS="\t" "$FILENAME"
done

I wanted to whenever columns: Compound, Method and Approach are a match energy values be reduced from each other exactly as Singlet-Triplet and form a new tabular data all together. for example

Code:
C(CCH)2        	singlet   	CC        	TO   		1.108791	-191.426232325854 	3-1/C-CCH-2/C-CCH-2-s.out
C(CCH)2        	triplet   	CC        	TO   		2.235993	-191.434509836762 	3-1/C-CCH-2/C-CCH-2-t.out

Form one row as

Code:
C(CCH)2   	CC        	TO	0.008277510908

and of course if match is not found just a simple error match not found or data not available. your help is appreciated and thanks in advance
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top