Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Please awk expert!!!! Filter sequence

Status
Not open for further replies.

demis001

Programmer
Aug 18, 2008
94
US
I want to filter fastafile using awk. The inpute file looks like

Input file:
>cel-miR-49 MIMAT0000020 Caenorhabditis elegans miR-49
AAGCACCACGAGAAGCUGCAGA
>cel-miR-50 MIMAT0000021 Caenorhabditis elegans miR-50
UGAUAUGUCUGGUAUUCUUGGG
>cel-miR-51 MIMAT0000022 Caenorhabditis elegans miR-51
UACCCGUAGCUCCUAUCCAUGUU
>hsa-miR-28-3p MIMAT0004502 Homo sapiens miR-28-3p
CACUAGAUUGUGAGCUCCUGGA
>hsa-miR-29a MIMAT0000086 Homo sapiens miR-29a
UAGCACCAUCUGAAAUCGGUUA
>hsa-miR-30a MIMAT0000087 Homo sapiens miR-30a
UGUAAACAUCCUCGACUGGAAG
>hsa-miR-31 MIMAT0000089 Homo sapiens miR-31
AGGCAAGAUGCUGGCAUAGCU

Is there easy to filter only sequence with >hsa? grep only return the fasta header not the sequence. I need the fasta header with the sequence

Dereje
 
Are you using Linux? If so try grep -A1 '>hsa' inputfile.

If not, awk '/>hsa/ { print; getline; print }' inputfile.

Annihilannic.
 
Many thanks!!

What if If I have unequal sequence line after fasta header?

>hsa
aaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>hsa
aaaaaaaaaaaaaaaaaaaaaaaaaa
>hsa
ttttttttttttttttttttttttttt
ttttttttttttttttttttttttttt
ttttttttttttttttttttttttttttt

Dereje
 
Code:
awk '
        /^>hsa/ {
                print
                while (getline) {
                        if ($0 ~ /^>/ && $0 !~ /^>hsa/) next
                        print
                }
        }

' inputfile

Annihilannic.
 
Many thanks and really appriciate. I carried out the same task with this perl script followed by the original single line awk script. I really wonder how simple to have awk. Just started learning.


my $seq;
while (<>) {
if (/^>/) { # header line
print "$seq\n" if defined $seq;
print;
$seq='';
} else {
chomp;
$seq .= $_;
}
}
print $seq,"\n";

awk '/>hsa/ { print; getline; print }' inputfile.
 
Have a look at the perl -n and perl -p options (on man perlrun). They make perl behave in a very similar way to awk and are handy for writing one-liners.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top