Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

New to using AWK - query regarding reading in from file 1

Status
Not open for further replies.

jamie999

Programmer
Sep 28, 2011
4
GB
Hi there,

Firstly apologies, I am completely new to AWK but am a reasonable Perl programmer.
I have actually completed the following task using a perl script, but I feel that AWK *may* be faster so it would be good to know if what I want to do can be carried out in AWK as in all likelihood I will need to repeat the task again and possibly on larger files.

The task is quite simple:
A file contains a list of names. These also exist as column headers in several other files.
So for example:

name_file has:
red
blue
purple

other_file1 has:
green blue yellow red
0.4 0.3 0.2 0.7
0.1 0.5 0.9 0.2
etc...

What I need to do is extract the full column where the header matches a name from the name_file, so in this case it would be:

blue red
0.3 0.7
0.5 0.2

and then send those selected columns to a new file.

I know that you can use something like
Code:
awk -f2,4 other_file1 > new_file1
to achieve this from the command line, but this isn't rally appropriate for this case.
Is it possible to do all the above with awk? I also toyed with the idea of doing the original matching in perl and then passing a string with all the positions, a bit like this:
Code:
system(awk -f$stringpos $file > $fileout)
But this had all sorts of errors, mainly because it won't accept the string being passed with the -f.

Any pointers gladly accepted!
 
Hi

jamie999 said:
I know that you can use something like
Code:
awk -f2,4 other_file1 > new_file1
Not really. That looks like [tt]cut[/tt] syntax.
Code:
awk 'FNR==NR{r[$1]=1;next}FNR==1{n=0;for(i=1;i<=NF;i++)if($i in r)c[++n]=i}{for(i=1;i<=n;i++)printf"%s%s",$c[i],i<n?OFS:ORS}' name_file other_file > new_file
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.
 
Hi

As you mentioned large files, metaprogramming could make it significantly faster by reducing the operations to do while processing the other_file :
Code:
awk 'FNR==NR{r[$1]=1;next}FNR==1{s="";for(i=1;i<=NF;i++)if($i in r)s=s (s?",":"")"\\$"i;printf"awk \"{print%s}\" \"%s\"\n",s,FILENAME;exit}' name_file other_file | sh > new_file
Or the same as above generating a [tt]cut[/tt] command :
Code:
awk 'FNR==NR{r[$1]=1;next}FNR==1{s="";for(i=1;i<=NF;i++)if($i in r)s=s (s?",":"")i;printf"cut -d\" \" -f%s \"%s\"\n",s,FILENAME;exit}' name_file other_file | sh > new_file
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.
 
Thank you feherke - I'll have a play about with both options and see which best suits.

Yes sorry, that was the code for cut, I'd been comparing cut, awk and perl performing the same task and found that awk and cut took approximately the same time but perl was slower. I obviously mixed up my cut and awk syntax when I posted - apologies!
 
Hmmmm, in that case I may go with cut option. Just one question. The code you've given won't work as such. That's not your fault, I gave a slightly simplified version of the task for ease of explanation. The list of names in the name_file are not totally identical to the headers in the other_file. In perl I just used match - I understand that awk has similar syntax?
 
Hi

You mean like :
Code:
red
blue
purple
vs.
Code:
green [red]dark[/red]blue yellow red[red]dish[/red]
0.4 0.3 0.2 0.7
0.1 0.5 0.9 0.2
Then these modifications will work :
Code:
awk 'FNR==NR{r[[red]NR[/red]]=[red]$1[/red];next}FNR==1{n=0;for(i=1;i<=NF;i++)[red]for(j=1;j in r;j++)if($i~r[j])[/red]c[++n]=i}{for(i=1;i<=n;i++)printf"%s%s",$c[i],i<n?OFS:ORS}' name_file other_file > new_file

[gray]# or[/gray]

awk 'FNR==NR{r[[red]NR[/red]]=[red]$1[/red];next}FNR==1{s="";for(i=1;i<=NF;i++)[red]for(j=1;j in r;j++)if($i~r[j])[/red]s=s (s?",":"")"\\$"i;printf"awk \"{print%s}\" \"%s\"\n",s,FILENAME;exit}' name_file other_file | sh > new_file

[gray]# or[/gray]

awk 'FNR==NR{r[[red]NR[/red]]=[red]$1[/red];next}FNR==1{s="";for(i=1;i<=NF;i++)[red]for(j=1;j in r;j++)if($i~r[j])[/red]s=s (s?",":"")i;printf"cut -d\" \" -f%s \"%s\"\n",s,FILENAME;exit}' name_file other_file | sh > new_file

Feherke.
 
Yes that sort thing exactly - thank you Feherke. I'll sit down and try and learn awk properly when I have more time, it seems like it is significantly quicker than perl for these relatively simple tasks.

Thanks,
Jamie.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top