Any help would be appreciated ...
I have a text file that has n columns. Each column has a header that is a 3 letter sequence (ABA ABC CDC ... ). I want to isolate all the columns that have a header where the first and the third letters are the same (ABA BAB CAC CBC ...). Here is an example of the kind of data:
CCP CPP CRP PCP PPP PRP RCP RPP RRP
4 5 0 0 43 3 1 5 5
0 1 1 0 3 3 2 6 3
1 2 0 1 5 1 0 4 4
1 3 0 2 13 0 1 7 7
1 2 0 1 14 1 1 5 4
1 1 0 0 3 0 0 3 3
1 1 0 0 26 1 0 1 0
I would like to return columns 4 and 6.
I know that the following will isolate the columns if I specify the headers (in this case all headers ending in P):
awk 'NR==1{for(i=1;i<=NF;i++)if($i~/..P/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",$f;print " ", $1,""}' test.txt
How can I isolate all the columns with this #.# repeat pattern, where # is the same letter?
I have a text file that has n columns. Each column has a header that is a 3 letter sequence (ABA ABC CDC ... ). I want to isolate all the columns that have a header where the first and the third letters are the same (ABA BAB CAC CBC ...). Here is an example of the kind of data:
CCP CPP CRP PCP PPP PRP RCP RPP RRP
4 5 0 0 43 3 1 5 5
0 1 1 0 3 3 2 6 3
1 2 0 1 5 1 0 4 4
1 3 0 2 13 0 1 7 7
1 2 0 1 14 1 1 5 4
1 1 0 0 3 0 0 3 3
1 1 0 0 26 1 0 1 0
I would like to return columns 4 and 6.
I know that the following will isolate the columns if I specify the headers (in this case all headers ending in P):
awk 'NR==1{for(i=1;i<=NF;i++)if($i~/..P/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",$f;print " ", $1,""}' test.txt
How can I isolate all the columns with this #.# repeat pattern, where # is the same letter?