Isolating columns based on patterns in header

katlzrd · Jan 24, 2012

Any help would be appreciated ...

I have a text file that has n columns. Each column has a header that is a 3 letter sequence (ABA ABC CDC ... ). I want to isolate all the columns that have a header where the first and the third letters are the same (ABA BAB CAC CBC ...). Here is an example of the kind of data:

CCP CPP CRP PCP PPP PRP RCP RPP RRP
4 5 0 0 43 3 1 5 5
0 1 1 0 3 3 2 6 3
1 2 0 1 5 1 0 4 4
1 3 0 2 13 0 1 7 7
1 2 0 1 14 1 1 5 4
1 1 0 0 3 0 0 3 3
1 1 0 0 26 1 0 1 0

I would like to return columns 4 and 6.

I know that the following will isolate the columns if I specify the headers (in this case all headers ending in P):

awk 'NR==1{for(i=1;i<=NF;i++)if($i~/..P/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",$f;print " ", $1,""}' test.txt

How can I isolate all the columns with this #.# repeat pattern, where # is the same letter?

PHV · Jan 24, 2012

Replace this:
if($i~/..P/)
with this:
if(substr($i,1,1)==substr($i,3,1))

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

katlzrd · Jan 25, 2012

Thanks! Worked perfectly! Your help is much appreciated!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Isolating columns based on patterns in header

katlzrd

Technical User

PHV

MIS

katlzrd

Technical User

Similar threads

Part and Inventory Search

Sponsor