Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

substituting repeated regions in a long string

Status
Not open for further replies.

jimineep

Technical User
May 16, 2006
20
GB
Hi I have a long string in which I want to substitute repeated AT regions with N, ie ATATATATATATATATATAT NNNNNNNNNNNNNNNNNNNN I am having a problem doing this i have tried with tr/// which seems to mask all AT no matter how many repeats there are, and also with the s/// which behaves oddly. Enclosed is my latest attempt which prints the NNN sequence but prints it BEFORE the substituted sequence.

I realise I could use a for loop but I want to use this substitution for a sequence which is VERY long and I am sure there must be a better way


Code:
#!/usr/local/bin/perl

$testsequence=" ATGACGACTTATAGCGATGCTAGCATCTAGACTATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATCGCTAGTAACGTAGTAGCTGTAGTAGCTGACTGATGCTGTAGTGACTGATGC";

$testsequence=~s/([A,T]{9,})/for($i=0;$i<length($1);$i++){ printf "N";}/e;

print $testsequence;
print "testsure";
 
s/// doesn't behave oddly - you're just using it oddly ;-)

What about simply:
Code:
$testsequence =~ s/AT/NN/g;

That replaces all occurrences of "AT" with "NN"

Are you looking to replace all occurrences of the string "AT" or any combinations of the letters "A" and "T". I'm not entirely sure what you're attempting here.
 
If you want to replace both A & T with N no matter what sequence A & T are,
Code:
$testsequence =~ s/T/N/g ;
$testsequence =~ s/A/N/g ;

--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
I think u misunderstand me. I only want to replace AT with NN when AT is repeated 9 or more times(hence {9,})

ie in this string ATGCACAACAATATACGAATATATATATATATATATCGTATCGC

will become this

ATGCACAACAATATACGANNNNNNNNNNNNNNNNNNCGTATCGC

do u see the difference? I only want to replace repeat AT, not all instances of AT


 
Much clearer. Try this:

Code:
$testsequence =~ s/((?:AT){9,})/'N' x length($1)/ge;
 
gracias ishnid. What does the ?: in (?:AT) mean?
 
It's a non-capturing group. Have a look at the "Non-capturing groupings" section of perlretut.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top