Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

comparing header of two files 2

Status
Not open for further replies.

agilo

Programmer
Feb 4, 2004
73
TR
Hi,

I need to extract fields out of a file if the header of the field matches a a header of another file. For example.

File1: has only one row (reference)

x1 x2 x3 x4 x5 x6

File2: has more or less fields than that of File one:

x2 x4 x6 x7 x10
1 33 33 44 55
23 33 44 56 66
.
.
.
etc

I want to extract only the full records from File2 which their headers exist in the header of File1:

x2 x4 x6
1 33 33
23 22 44

Can any body help..

Thank you in advance,

Agilo

 
Try this. The columns don't line up very, you can change the printf statements to improve this
BEGIN {
getline < &quot;file1&quot;
for (j=1;j<=NF;j++) a[$j] = 1
}
NR==1{
for (j=1;j<=NF;j++) if (a[$j]) b[j] = 1
}
{
for (j=1;j<=NF;j++) {
if (b[j]) {
printf $j &quot; &quot;
}
}
print &quot;&quot;
}

CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
nawk -f agilo.awk file1.txt file2.txt

#----------------- agilo.awk
FNR == NR && FNR == 1 {
for(i=1; i <= NF; i++)
refFile1[$i];
next;
}

FNR == 1 {
split($0, refFile2);
for( i=1; i <= NF; i++)
if ( refFile2 in refFile1)
printf(&quot;%-4s%s&quot;, $i, OFS);
printf &quot;\n&quot;;
next;
}

{
for( i=1; i <= NF; i++)
if ( refFile2 in refFile1)
printf(&quot;%-4s%s&quot;, $i, OFS);
printf &quot;\n&quot;;
}


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Thank you guys,

Could you please Cakiwi tell me how to read in the file name &quot;file1&quot; from the command line; I mean I want to run the script you wrote like this: gawk -f prog.awk file1 file2

Thank you,

Agilo
 
Use similar code to that in vgersh99's solution

NR==1{
for (j=1;j<=NF;j++) a[$j] = 1
next
}
FNR==1{
for (j=1;j<=NF;j++) if (a[$j]) b[j] = 1
}
{
for (j=1;j<=NF;j++) if (b[j]) printf(&quot;%-4s&quot;,$j)
print &quot;&quot;
}

CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Thanks Cakiwi,

I have the problem that without using &quot;getline&quot;, I can get the correct FS. The FS in the input files are &quot;\t&quot;, but some variable names have space in their names (e.g, &quot;Temp xy&quot;).
I have added at the begining of the script FS = &quot;t&quot;.
I get correct results when I used
the getline function. But when I use your last suggestion, I can not get the correct extract for those variables which have sapaces in their names!.

Could you please help.

Thanks,

Agilo
 
i don't why it would work with the getline function and not without it. post some data that does not work. Antoher possibility is:

awk -v fn=file1 -f agilo.awk file2

BEGIN {
if (!fn) fn=&quot;file1&quot;
getline < fn
for (j=1;j<=NF;j++) a[$j] = 1
}
....

CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top