Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Subsetting lines that are in file1 but NOT in file2

Status
Not open for further replies.

jdhahbi

Technical User
Oct 7, 2009
24
US
I have 2 sets of data that look like this:

file1:
Code:
>s_7_2x1
AAAAAAAAAGTTGGTCTTG
>s_7_24x2
AAAAAAAAGGTCGGGCCTGGTT
>s_7_3x3
AAAAAAACAGAGTTCA
>s_7_4x4
AAAAAAACATGGCGCACTTCTT
>s_7_5x5
AAAAAAACATGGNGCACTTCTTTTCGCNTGGCGGC
file2:
Code:
>s_7_2x1
AAAAAAAAAGTTGGTCTTG
>s_7_24x2
AAAAAAAAGGTCGGGCCTGGTT
file2 is a subset of file1. I would like to extract the lines that are in file1 but not in file2.
The outfile should look like this:
Code:
>s_7_3x3
AAAAAAACAGAGTTCA
>s_7_4x4
AAAAAAACATGGCGCACTTCTT
>s_7_5x5
AAAAAAACATGGNGCACTTCTTTTCGCNTGGCGGC

thank you for your help
 
Hi

This is one of the most frequent problems asked in this forum. Next time please search the older threads before asking.

And next time show us what you tried so far.
Code:
awk 'FNR==NR{d[$0]=1;next}!($0 in d)' file2 file1
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Note that my solution does what you asked for, but I have a feeling that your requirement was incomplete.


Feherke.
 
Thank you very much.
Next time, I will do as you said.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top