Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Recognizing empty line using ""

Status
Not open for further replies.

demis001

Programmer
Aug 18, 2008
94
US
The following script works when I have changed to ==1 but will not exclude when I changed to ==0. The other problem is that I want to recognize and exclude lines with shorter reads less than 14 character using index(substr($1, 1, 14), " ")==0 but will not work

awk 'index(substr($1, 2, 10), ".")==0||index(substr($1, 1, 15), "AAAAAAAAAAAAAAA")==0||index(substr($1, 1, 15), "GGGGGGGGGGGG")==0||index(substr($1, 1, 15), "TTTTTTTTTTTTT")==0||index(substr($1,1, 15), "CCCCCCCCCCCCC")==0||index(substr($1, 1, 14), " ")==0{print $1}' gcb110_adaptor_removed.txt

Data will looks like this:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
TGAGGTAGTAGATTGTATAGTTTCGTATTCCGTTTT
TGAGGTAGTAGATTGTATAGTT
A...................................

CGGATGAGCAAAGAAAGTGGTT
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ATCGT

TAGCTTATCAGACTGATGTTT
TGAGGTAGTAGATTGTATAGTTA
CAACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
GTCAA

I want to exclude these with shorter, empety line, with A's T's and so on and only print quality sequence

Dereje

 
exclude lines with shorter reads less than 14 character
length($0)<14{next}

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
I tested by including the line you have suggested to the above script and didn't work. When I excute separetly as

awk 'length($0)>14{print $1}' input file, it will work.
My intention is to put a statement on the awk line I have spacified:

awk 'index(substr($1, 2, 10), ".")==0||index(substr($1, 1, 15), "AAAAAAAAAAAAAAA")==0||index(substr($1, 1, 15), "GGGGGGGGGGGG")==0||index(substr($1, 1, 15), "TTTTTTTTTTTTT")==0||index(substr($1,1, 15), "CCCCCCCCCCCCC")==0||index(substr($1, 1, 14), " ")==0{print $1}' gcb110_adaptor_removed.txt

Dereje
 
I want to exclude the output from the following statment together with the one I have specified:

awk 'length($0)<14{print $1}'

Derje
 
Could you please explain us which output you want from the data you've posted, ie what are the rules ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
From this file:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
TGAGGTAGTAGATTGTATAGTTTCGTATTCCGTTTT
TGAGGTAGTAGATTGTATAGTT
A...................................

CGGATGAGCAAAGAAAGTGGTT
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ATCGT
CCCCCCCCCCCCCCCCCCCCCCCCC
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
.AACGGAATACCATAAGCAGCTTTGTATTTCGGTCT

TAGCTTATCAGACTGATGTTT
TGAGGTAGTAGATTGTATAGTTA
CAACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
GTCAA

I want the following lines:
TGAGGTAGTAGATTGTATAGTTTCGTATTCCGTTTT
TGAGGTAGTAGATTGTATAGTT
CGGATGAGCAAAGAAAGTGGTT
.AACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
TAGCTTATCAGACTGATGTTT
TGAGGTAGTAGATTGTATAGTTA
CAACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
 
Litrally, If I am able to exclude the output of this statment. I will achieve what I want:
awk '{gsub(/\r/, ""); print $1}' gcb110_adaptor_removed.txt | awk 'index(substr($1, 2, 10), ".")==1||index(substr($1, 1, 15), "AAAAAAAAAAAAAA")==1||index(substr($1, 1, 15), "GGGGGGGGGGGG")==1||index(substr($1, 1, 15), "TTTTTTTTTTTTT")==1||index(substr($1,1, 15), "CCCCCCCCCCCCC")==1||(length($0)<14)==1'

Dereje
 
awk '{gsub(/\r/,"")/^.\./||/^A{15}/||/^G{15}/||/^T{15}/||/^C{15}/||length($0)<14{next}1' gcb110_adaptor_removed.txt


Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
got error and edited some syntax error.
awk: {gsub(/\r/,"")/^.\./||/^A{15}/||/^G{15}/||/^T{15}/||/^C{15}/||length($0)<14{next}1
awk: ^ syntax error
I edited like this and still got an error:
awk: {gsub(/\r/,"")/^\.\./||/^A{15}/||/^G{15}/||/^T{15}/||/^C{15}/||length($0)<14{next}$1

There is also another problem this doesn't handle, I want to keep if the sequence is like this
.AACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
and delet if it is like AA...AAA

Dereje

 
OOps, sorry for the typo:
awk '{gsub(/\r/,"")}/^.\./||/^A{15}/||/^G{15}/||/^T{15}/||/^C{15}/||length($0)<14{next}1' gcb110_adaptor_removed.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
The following line is not deleted

AAC.................................

But it did most of the job. are you able to include the index(substr,,) for this one. I want to escape the line if it contain single . between character 2-15.

Dereje
 
awk '{gsub(/\r/,"")}substr($0,2,15)~/\./||/^A{15}/||/^G{15}/||/^T{15}/||/^C{15}/||length($0)<14{next}1' gcb110_adaptor_removed.txt

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Oops Many problem:

Doesn't escape the following line

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Dereje
 
It does for me ...
FYI, this is this pattern /^A{15}/
 
It is wierd to see that but it will print these line on the output. I know the pattern is correct. I just coppied and pest your code to command line and pipe the output to file.

Thanks
 
With your posted data I get your posted expected result.
 
I have excuted this line:

awk '{gsub(/\r/,"")}substr($0,2,15)~/\./||/^A{15}/||/^G{15}/||/^T{15}/||/^C{15}/||length($0)<15{next}1' gcb110_adaptor_removed.txt

Part of data printed
AACGGAATCCCAAAAGCAGCTGTTCGTATCCTGTTT
GATTCTCAGGGATGGGTTA
ATCCGGCTCGAAGGACCA
CTGGACTTGGAGTCAGAAGGT
TGAGGTAGTAGATTGTATAGTT
GAAGCGGGTGCTCTTATTT
TGAGGTAGTAGGTTGTATAGTT
TACTATGCGGTGGGGCCTCGGACGCGGTCTTCGGCT
TGAGGTAGTAGATTGTATAGTT
TGAGGTAGTAGTTTGTACAGTT
TGAGGTAGTAGATTGTATAGTT
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
TGAGGTAGCAGATTCTGTATGT
TAGCTTCTCAGACTGATGTTGAC
TCGCACCCTCTGAACACGTTTC
AGAGGGAGTAGATTGTATTTCTTCTTATGCCGTCTT

Dereje
 
I have saved the data I have printed as test1.txt and excuted with the awk line
$ awk 'substr($0,2,15)~/\./||/^A{15}/||/^G{15}/||/^T{15}/||/^C{15}/||length($0)<15{next}1' test1.txt
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
TGAGGTAGTAGATTGTATAGTTTCGTATTCCGTTTT
TGAGGTAGTAGATTGTATAGTT
CGGATGAGCAAAGAAAGTGGTT
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
CCCCCCCCCCCCCCCCCCCCCCCCC
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
.AACGGAATACCATAAGCAGCTTTGTATTTCGGTCT
TAGCTTATCAGACTGATGTTT
TGAGGTAGTAGATTGTATAGTTA
CAACGGAATACCATAAGCAGCTTTGTATTTCGGTCT

I looks like not working except the last regex!! OOPs

Dereje
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top