Hi all,
I have two txt files to handle. One is "short_sequences" and the other
one is "long_sequences". The "short_sequences" holds
100 short sequences (8 nucleotide long) and 100 long sequences (200 nucleotide
long) in the "long_sequence".
For example, the first short sequence is "TTGACATA" and the first long sequence
is "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".
Basically, we want to generate a random position as a starting site to replace a substring
in the long sequence with a short sequence. In this example, we can choose a starting site
as 5th nucleotide in the long sequence, after replacing using "TTGACATA", the replaced
long sequence is "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".
Then I want replace the 2nd long sequence with the 2nd short sequence and repeat this over and over again until the last long sequence is reached and replaced. I think
the only problem is that the starting site should not be larger than 193. Otherwise, there are
not enough nucleotides in the long sequence for replacement.
Furthurmore, I want to keep track the starting replacement site for each long sequence.
I am copying my code in the below.
use strict;
use warnings;
my (@short, @long, $offset); # the 'short' array will hold the short
#sequences while 'long' array the long sequences
open(FILE1, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n";
while(<FILE1>){
chomp;
push(@short, $_);
}
close FILE1; #Close the file
open(FILE2, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n";
while(<FILE2>){
chomp;
push(@long, $_);
}
close FILE2; #Close the file
# replacement
foreach my $short(@short){
foreach my $long(@long){
$offset = int(rand(length($long)%193));
substr($long,$offset,length($short),$short);
printf "%3d", $offset+1;
print "\n", $long, "\n";
}
}
But I just realized that there is a problem for the two
loops. Actually each short sequence will replace all long sequences not the corresponding one.
So I seek your suggestions on how to handle two files
simultaneously for my case.
Thank you very much and look forward to your reply!
Best Regards,
Alex
I have two txt files to handle. One is "short_sequences" and the other
one is "long_sequences". The "short_sequences" holds
100 short sequences (8 nucleotide long) and 100 long sequences (200 nucleotide
long) in the "long_sequence".
For example, the first short sequence is "TTGACATA" and the first long sequence
is "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".
Basically, we want to generate a random position as a starting site to replace a substring
in the long sequence with a short sequence. In this example, we can choose a starting site
as 5th nucleotide in the long sequence, after replacing using "TTGACATA", the replaced
long sequence is "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".
Then I want replace the 2nd long sequence with the 2nd short sequence and repeat this over and over again until the last long sequence is reached and replaced. I think
the only problem is that the starting site should not be larger than 193. Otherwise, there are
not enough nucleotides in the long sequence for replacement.
Furthurmore, I want to keep track the starting replacement site for each long sequence.
I am copying my code in the below.
use strict;
use warnings;
my (@short, @long, $offset); # the 'short' array will hold the short
#sequences while 'long' array the long sequences
open(FILE1, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n";
while(<FILE1>){
chomp;
push(@short, $_);
}
close FILE1; #Close the file
open(FILE2, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n";
while(<FILE2>){
chomp;
push(@long, $_);
}
close FILE2; #Close the file
# replacement
foreach my $short(@short){
foreach my $long(@long){
$offset = int(rand(length($long)%193));
substr($long,$offset,length($short),$short);
printf "%3d", $offset+1;
print "\n", $long, "\n";
}
}
But I just realized that there is a problem for the two
loops. Actually each short sequence will replace all long sequences not the corresponding one.
So I seek your suggestions on how to handle two files
simultaneously for my case.
Thank you very much and look forward to your reply!
Best Regards,
Alex