Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to extract only columns where header matches certain strings 1

Status
Not open for further replies.

NBMike

Technical User
Apr 1, 2005
25
0
0
GB
Hi, please can someone help.

I have a huge csv file with thousands of columns in it, each column has a column header (in the first row). I need to extract only the columns where the header contains a certain string.

eg. I only want to extract all the columns where the header contains the word 'fred'.

I know how to use Awk to extract columns based on the column number, but unfortunately in my case the columns I want can be in a different column number every time, the only consistent thing is that the column header contains a certain string..so I need this solution to be able to extract columns based on whether a string exists in the header or not.

If anyone can offer any suggestions using Awk or any other UNIX command I'd be most grateful.

Thankyou
Michael
 
Hi

Code:
awk 'NR==1{for(i=1;i<=NF;i++)if($i~/fred/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",$f[i];print""}' /input/file
Michael said:
If anyone can offer any suggestions
Next time post some sample data and the code you wrote so far.

Feherke.
 
Hi, thanks for the quick reply, however this doesn't seem to work, it outputs all the data (it doesn't extract just the specific columns I want).

For example I have a csv file (called test2.csv) containing the following:

"barney","fred","wilma"
1,2,3
4,5,6
7,8,9

I just want to extract the column that has 'fred' in the header, eg. the desired output is

"fred"
2
5
8

When I type the command you provided it outputs all of it, not just the 2nd column. The exact command I'm using is

awk 'NR==1{for(i=1;i<=NF;i++)if($i~/fred/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",$f;print""}' test2.csv

Am I doing something wrong?

Thanks again for your help
Michael
 
Hi

Sorry, I did not noticed that you mentioned CSV. ( And I usually store my data in tab-separated format. ) You only have to specify comma ( , ) as input field separator :
Code:
[blue]master #[/blue] awk [red]-F,[/red] 'NR==1{for(i=1;i<=NF;i++)if($i~/fred/)f[n++]=i}{for(i=0;i<n;i++)printf"%s%s",i?" ":"",$f[i];print""}' <<ENDOFINPUT
> "barney","fred","wilma"
> 1,2,3
> 4,5,6
> 7,8,9
> ENDOFINPUT
"fred"
2
5
8

Feherke.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top