The following script works when I have changed to ==1 but will not exclude when I changed to ==0. The other problem is that I want to recognize and exclude lines with shorter reads less than 14 character using index(substr($1, 1, 14), " ")==0 but will not work
awk 'index(substr($1, 2, 10), ".")==0||index(substr($1, 1, 15), "AAAAAAAAAAAAAAA")==0||index(substr($1, 1, 15), "GGGGGGGGGGGG")==0||index(substr($1, 1, 15), "TTTTTTTTTTTTT")==0||index(substr($1,1, 15), "CCCCCCCCCCCCC")==0||index(substr($1, 1, 14), " ")==0{print $1}' gcb110_adaptor_removed.txt
Data will looks like this:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
TGAGGTAGTAGATTGTATAGTTTCGTATTCCGTTTT
TGAGGTAGTAGATTGTATAGTT
A...................................
CGGATGAGCAAAGAAAGTGGTT
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ATCGT
TAGCTTATCAGACTGATGTTT
TGAGGTAGTAGATTGTATAGTTA
CAACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
GTCAA
I want to exclude these with shorter, empety line, with A's T's and so on and only print quality sequence
Dereje
awk 'index(substr($1, 2, 10), ".")==0||index(substr($1, 1, 15), "AAAAAAAAAAAAAAA")==0||index(substr($1, 1, 15), "GGGGGGGGGGGG")==0||index(substr($1, 1, 15), "TTTTTTTTTTTTT")==0||index(substr($1,1, 15), "CCCCCCCCCCCCC")==0||index(substr($1, 1, 14), " ")==0{print $1}' gcb110_adaptor_removed.txt
Data will looks like this:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
TGAGGTAGTAGATTGTATAGTTTCGTATTCCGTTTT
TGAGGTAGTAGATTGTATAGTT
A...................................
CGGATGAGCAAAGAAAGTGGTT
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ATCGT
TAGCTTATCAGACTGATGTTT
TGAGGTAGTAGATTGTATAGTTA
CAACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
GTCAA
I want to exclude these with shorter, empety line, with A's T's and so on and only print quality sequence
Dereje