Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Search string in other file 1

Status
Not open for further replies.

demis001

Programmer
Aug 18, 2008
94
US
I know perl way to search a string inside other file. Is there an easy awk way to search a string in side second file?

Example: Data1
1 TGAGGTAGTAGGTTGTATAGTT
2 CTATACAATCTACTGTCTTTC

Exaple: Data2
>1 TGAGGTAGTAGGTTGTATAGTT
>2 TGAGGTAGTAGGTTGTATAGTT
>3 TGAGGTAGTAGGTTGTATAGTT
>4 CTATACAATCTACTGTCTTTC
>5 CTATACAATCTACTGTCTTTC
>6 TGAGGTAGTAGGTTGTGTGGTT
>7 CTATACAACCTACTGCCTTCCC
>8 TGAGGTAGTAGGTTGTATAGTT

The purpose is to take $2 data1 and search in $2 of Data2. If found

Echo the $0 from both file side by side.

In perl I have achieved using seek funcation

Dereje
 
awk 'NR==FNR{a[$2]=$0;next}$2 in a{print a[$2]"\t"$0}' Data1 Data2

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
It does not catch the following,

AAATT CCCAAATTT

I want these two line to match since the first is totaly found in the second. What you did in this above script is to look exact match in both file. I want to look the pattern of file one $2 in Data2
 
Code:
awk '{print $2}' Data1 > /tmp/Data1.tmp
grep -f /tmp/Data1.tmp Data2
rm /tmp/Data1.tmp

Annihilannic.
 
Thanks,

It only print $0 of Data2 file matching a line from Data1 file. Is it possible to print a match line from both file?

Like this

$0 of Data1 \t $0 of Data2 -matching only

Dereje
 
awk 'NR==FNR{a[$2]=$0;next}{for(i in a)if($2~i)print a"\t"$0}' Data1 Data2

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
It prints all lines of Data2 with unchanging Data1 line

ACAATTGCGGTTTTT >hsa-1 TGCCCTTAAAGGTGAACCCAGT
ACAATTGCGGTTTTT >hsa-2 TGGGGAGCTGAGGCTCTGGGGGTG
ACAATTGCGGTTTTT >hsa-3 AAGGCAGGGCCCCCGCTCCCC

Some error and the line never match too

Dereje
 
With sample Data1 and Data2 provided at the top of the post. I can get
>1 GGTAGTAGGTTGTATAGTT
1 GGTAGTAGGTTGTATAGTT >1 GGTAGTAGGTTGTATAGTT
>2 TGAGGTAGTAGGTTGTATAGTT
1 GGTAGTAGGTTGTATAGTT >2 TGAGGTAGTAGGTTGTATAGTT
>3 TGAGGTAGTAGGTTGTATAGTT
1 GGTAGTAGGTTGTATAGTT >3 TGAGGTAGTAGGTTGTATAGTT
>4 CTATACAATCTACTGTCTTTC
2 CTATACAATCTACTGTCTTTC >4 CTATACAATCTACTGTCTTTC
>5 CTATACAATCTACTGTCTTTC
2 CTATACAATCTACTGTCTTTC >5 CTATACAATCTACTGTCTTTC
>6 TGAGGTAGTAGGTTGTGTGGTT
>7 CTATACAACCTACTGCCTTCCC
>8 TGAGGTAGTAGGTTGTATAGTT
1 GGTAGTAGGTTGTATAGTT >8 TGAGGTAGTAGGTTGTATAGTT

I want only match lines
 
As far as I can see PHV's last solution does exactly what you described:

Code:
$ cat Data1
1  TGAGGTAGTAGGTTGTATAGTT
2  CTATACAATCTACTGTCTTTC
3  AAATT
$ cat Data2
>1    TGAGGTAGTAGGTTGTATAGTT
>2    TGAGGTAGTAGGTTGTATAGTT
>3    TGAGGTAGTAGGTTGTATAGTT
>4    CTATACAATCTACTGTCTTTC
>5    CTATACAATCTACTGTCTTTC
>6    TGAGGTAGTAGGTTGTGTGGTT
>7    CTATACAACCTACTGCCTTCCC
>8      TGAGGTAGTAGGTTGTATAGTT
>9    CCAAATTT
$ awk 'NR==FNR{a[$2]=$0;next}{for(i in a)if($2~i)print a[i]"\t"$0}' Data1 Data2
1  TGAGGTAGTAGGTTGTATAGTT       >1    TGAGGTAGTAGGTTGTATAGTT
1  TGAGGTAGTAGGTTGTATAGTT       >2    TGAGGTAGTAGGTTGTATAGTT
1  TGAGGTAGTAGGTTGTATAGTT       >3    TGAGGTAGTAGGTTGTATAGTT
2  CTATACAATCTACTGTCTTTC        >4    CTATACAATCTACTGTCTTTC
2  CTATACAATCTACTGTCTTTC        >5    CTATACAATCTACTGTCTTTC
1  TGAGGTAGTAGGTTGTATAGTT       >8      TGAGGTAGTAGGTTGTATAGTT
3  AAATT        >9    CCAAATTT
$

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top