Hi all!
I asked a question about "how to handle two files
simultaneously" several days ago and a couple of
people gave me very useful tips. Thanks again!
Bascially, "short_sequence.txt" file holds 100 short
sequences (8 nucleotide long)and "long_sequence.txt"
file holds 100 long sequences (200 nucleotide long).
Every short sequence will be used to replace a substring
with the same width in the corresponding long sequences.
Then it will return a replaced file which holds 100 new
long sequences.
I have been testing my code and Duncdude's code for that
job. However, when I find is the substring in the replaced
txt file is not the same as the short sequences in the "short_sequences.txt".
What are your suggestions?
Thanks!
My code is :
CODE
#!/usr/bin/perl
use strict;
use warnings;
my (@short, @long, $offset); # the 'short' array will hold the short
#sequences while 'long' array the long sequences
open(SHORT, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n";
while(<SHORT>){
chomp $_;
chop $_;
push(@short, $_);
}
close SHORT; #Close the file
open(LONG, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n";
while(<LONG>){
chomp $_;
chop $_;
push(@long, $_);
}
close LONG; #Close the file
# replacement
for(my $i = 0; $i <= $#short; $i++){
$offset = int(rand(193));
print $offset."\n";
#print length($short[$i]);
substr($long[$i],$offset,length($short[$i]),$short[$i]);
print "\n", $long[$i], "\n";
}
Duncdude's code is:
CODE
#!/usr/bin/perl
open (SHORT, "< short.txt");
chomp (@short = <SHORT>);
close SHORT;
open (LONG, "< long.txt");
chomp (@long = <LONG>);
close LONG;
open (OUT_HTML, "> output.html");
print OUT_HTML "<pre>";
open (OUT_NORM, "> output.txt");
for ($x=0; $x<=$#short; $x++) {
$r=int(rand(length ($long[$x]) - length ($short[$x]) + 1));
print "### $r ###\n";
print "$long[$x]\n";
# this section is for visual purposes only
$output_norm = substr($long[$x], $r, length $short[$x]);
print " " x $r;
print "$output_norm\n";
substr($long[$x], $r, length $short[$x]) = "<font color=red><b>$short[$x]</b></font>";
print OUT_HTML "$long[$x]\n";
$long[$x] =~ s/<[^>]+>//g;
print OUT_NORM "$long[$x]\n";
}
close OUT_HTML;
close OUT_NORM;
The "short_sequences.txt" file is:
CAAGACAA
ACAGTTCA
CCGAATTC
GATACTAA
GTTTCACC
TCATCTCA
CCGCCGAT
TATTGTCA
ACGTAACG
TGAGGCTT
CAAATTAC
TCGACGGG
GGTCAGGT
ATAAGGCA
ACGGGCTG
CTCCTCGT
GTAATCAT
GACAAGAA
AACCGTGA
GTCCACGG
AAGCCGGG
CTTGTCCT
GTTCTTAC
AATGCCGG
TAGCAGGA
GTCTTGAG
AGATCTGA
ATTTGATG
TGTTGGCG
CACTGGTT
AGTGAAGG
CTATGATC
GCGATCTA
AATCCAAT
TGATCGCA
TTTGAGAA
TGGTATCT
TCTGGACG
AACCCTGA
TTCCACCT
CGCTCCTG
GACTAGGG
CCAAGCAT
CAACGTTA
TCCTTAAA
TCCCGCAT
TCGCCATT
AAGCTCTC
GTACCGTT
CCAATCCT
CAATCGTC
AGTCGCTA
TACACGAG
TTCGGTAT
ACCAATCG
TGGTGATA
ACCTATTG
TTCCGTAA
CGGCATCT
AAACATGA
CTTATAAT
CGAAGTGT
CCCCCATG
GACTAGAC
CAAAGTTA
AAGTAGTC
TTACGTCA
GACTGGAT
AATATTGG
CTGGGTAT
CTTCAGGT
TCTGTTGA
AAGTGCGT
ACCTTCGT
TAGAGACG
ACAGTCGA
AACGTCTT
GGCATGAT
TGGTTTAT
TTAACTAG
GTAATGGC
CTTGTCAT
TCTGCACA
ACTCAGAA
CTGACATG
ATTCAACC
TAGTCAAC
GACACGGT
AAGACTGC
ACGAGGCC
GTACGGTC
AGCCAGTG
CGTTGATC
CTTCAAAT
ACGTTAGA
CCAACGTC
TTCCGTGC
TTCGGTTC
TCCAATAC
The "long_sequences.txt" file is:
GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGAGCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCTCACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTCGAACCTTGGACTAACCACTGTCTGGATA
AGGATTACCCGCTGGACTTCAAACGCTCGTGAAGCATCGTATTGCGAGGCAACCGAGTCATAGCCCAGTCCGGGGGCCAACGCAGTGCCAGCATCTGCGTTGTTCATCGGTCCTCAGTCTCCCATCAACGTGGTCCACACCTAGCATCCTGGTTTTGCATCCGTAACAAAGGACGTTCGAAGTTTTTTGCCGGCGGGAAG
TGATAATTGGTGCAATATTCTCCATAACAGATCCTCGCCAATACGGATTTGAGGGATCCCTCTGCATTTCCACGAAGCGTGTCACCGATAGAGCAGAAATGCTTTACCGCCGCAGTGATTAGGCGGGTACAGTTGTCCAAACGCACACAACCGAAACCTCCCCATGCGTACTCGTTCGTTTAGTCGCGTACAGAGGGAAC
TCTAGCTTTCGTGAAGTCCCAGGCTCCTAGCCCTACAGCGACTTCTGGCCGGATCTGAGGGAAACTTGTCTAAACTTTTTTCCCCGCGAGACTCACTGTTATAAGAGCTGGAGTTTAGATGGGCATGGCATTCTATTGCTAGTATGTCTGAGAGGTGTTCTTTAGACCACGGGTTGATGCGGGTTCGAGTAGAGACACAC
CAGCGGGCGATCACCGTTAACGATGGTGGGTAACGGTCGGAATCGGAATAGGAGAAGAATATGAGATCGTCCATGGGACGTTTCAGACGAGACGTCCCAGCTAGCCGCTGCAATATGGTATTTGGACTCATTCTTCGAGACAGTAAGCGTGATTTTTATCAGGCCCGAGGCTGTTAAACACAGCTGCTCTAGGCCGAATG
ATGCACTTCCCCACGAACTCCTGTTAGGATGTACGTAAGAGTTCAAAATGCATTCCAATTCGTTTGACGTGCAAATTAGAACACCGATGGATTGTCCAGCTTTGCACTCTACGTGAGGGGAACTGTGGTAGGTTTTTTAATGAAGCCTGCCCCTAAGTCCTTAGCCTTTGATTCTGTTTTCCCACCTCCGGCATCCAACC
GTGAGCAAGTATAGCGTTCGTCCTCCGGAGATGGACCCACAATTCTCTGCCGCATAGTCGTATGCCAATTAGTTTAGTTAAAGCGTGACTTCGGGGGTATTAAAAGAGTGGGTAGAAGTAGCAATATTATGTTATAAGTGGATGACGCAGGTGATCATGTGCCTGTCCTCCTTCGCGATTTGCATTGATGGTGACTGTTA
CGTTTTAGGCTCGTACTTGTTAAGGTATATGTTCCAAGTTCAGCCCCGCTGGGTCCGTGGATTGATACATTAGTTTGAGACGCCTAAATATTATGGGTTACCTCGTAGCGGATATTAGTGCCACCTTGCGGTACGCGCTGTTCGACACTGGACTAAACCCTGAAGCTTCTGACAGCGTAACAAAAAAGCAGATGAATCAC
CAAACACGATTGTGCTTCGTGACGAACGGGTGTAGCACGTACCGTACCATGTGCCCCTAGTTCTGAAACGTGTCATCTTCTAACGCCTCACTCACCATAATGTTCACCTCGTGCTGTACACGCAGTGGCCTAGCGCTAGGAGCTCAATTGAGTGCGACATGCTCTGGACAGCTGTAGCAGCTATGCATTAGGCACCCGCC
AGATACGTTAAAGGACCCGTGTCGGTTCGGGATTGCTATAGGTTGTCCAGCCATCGGCTCAACCCCTGGGGCGCACGATTTCATGCATGCACATCGGCCAGTACGTTTGGCAAGGGCTCAATAGACAACCAATACTGATCCCGCGCCTACATTTTCGCTATCGCAATTATCAACATAGAATTCCAGACTCGACTTGCCAT
GTTGTAGGAGGACGCTGTGCCTGATGCCGGCAATAATTTCAGCGAGGCACTAGGAGGAAGTCCGGAAAAAGCAGCTTACCCCCAAGCCTTCCCCACATATTACTATGTGTTTATTACGGAAGACTGTGCGCGGCTTACGCATCAAGCACGTTAAACGCGTACCCTAGGTTCCATCCTGTACAGTTGTTCTTATAGTAGCG
GAAACGCGTTCGCGCACGACGGGTTATAAGTGGCCAACAGCTCTATGCTAATGGTGCACCGTTGGGGTCCTCGAAAACCTGTGAACAGGCACTCCTATCCAGACTACCTAAACGTTGAGCCCGAGGTCCGCCCGTGCTAATTAAGCCGGCTATAGGTTTCCGCAAGACGGAGTGCTTGTGCTGGCAAACAGGGGTTTAAG
GCTCGTACCAACAGGGATCTCTCCTATAAAAGGGGAATCACTTCATCGCGATCCCATTGGCCACGGCACGCTAGGCTGTCAGTGATGCAGCATGTGTATTTTTCGTCGCCCGTAACTGACACAAGGCCACATGAGTGGAGGCCCGGTATATGCAATTATCGGCGGTCGGGCTGTCTCGTATATTTCAATAGACAGGGGAC
GCTCGTACCAACAGGGATCTCTCCTATAAAAGGGGAATCACTTCATCGCGATCCCATTGGCCACGGCACGCTAGGCTGTCAGTGATGCAGCATGTGTATTTTTCGTCGCCCGTAACTGACACAAGGCCACATGAGTGGAGGCCCGGTATATGCAATTATCGGCGGTCGGGCTGTCTCGTATATTTCAATAGACAGGGGAC
TTCCCTTATCCGAAGGATTCACACAGACGTGCCAGTATTACGGACGGGGGGGCGTCTCGACCCGTTGGGGAGCGCCTGTAGCTGGGGTGCGCGACACTCAAGTTTGAGTAACGCAGGTTCGTAGCTAGTATGTTCAAAGACGTTCGCTGCACTCTCCGTCCCCATCAAGTTCTAGTTTATTAGTGGATCCGTGTATTAGC
AACTCAACCGCCAACCGAGAAGCCTAAACTTTTGATCTAAAACACCGAATACGGCGCCTGTATTCTGAGGGCAAGCGATGCTTCTGTCTAAATGCGTGTGCCAATCGCATCTCCGTTCTACGGCCACCGATCGCGCAACTTCCAATTAGTTGGAATGGCGTATGGACAGAACGTATGGGTCCGTGGTAAAGAAATTGTTG
ATTTGTCCCGGTTACGTACTCCACTGCCCCGTCGTGGTGATTAGAGTCATCTGACGTCCCCCACTTCCTTATCATCATACACATACAACGAAAACACTCCCACCGGCAAGTAATTCGGCTCCTCTTGTTAAATAGCCTTGGGCCTAGGCATTTACTTCACGTGAGTACAGGTTTCCATCATGAGTAGGAGTAACCCCGTA
CTGCTGTGTTTTTGCTCTAGAAACGCTCAGGGAGAATTTAAGTCGTGATCCTAATTAAAGATAGTTGCACTCGAAGATAGTATAATCACTCGCTGTATTGCGCCTTTAGGCGCGCGTCCCTGTCCAACTCCGAAAGTTAGTCCTTCTCGGGAGTGGTTCCAAAAATAAGCGAACCGTCAAGTTACCTTGGTATATAACCG
ACGCGCTTGAAGGACCCTGTCAATGATGGTGTAAATAAAGTCGGTATAGCTTGCACTTTAAGGCGGTAGCCCTAAGGAAGGTTTAGGACAGTTGTAGGCAGGTGATACGACTTGTTGTTGCTCTCTTAACGGGCTTTGAGGGTAAACCCGCATCGATAAACCGACGAGGATCATCGCGCATAAGGCAGGCATACCTTGTG
TTCGACGAGTAAAGACGGACGCGAGGTTCGCGCCAGTTCGTCCAGATTACCATGCGATGGTTTATCGTTCATCTAAGCAGCTACCCTTGCCGATTATGGAAAGATGTCATAAGTAGTCGCCAAGTGTTTAATGCTGCGAATCGAATAAACTACGCTGATTCCGGTCTAGAGGCTAGGCATCAGGTATGAAGCACTTAGGA
GATGCTTATACTTTAATGCCTTTGTGGTTTCACAGGGTCGACCTGGCGTAACGCTATCGGTTATCGCGGTTTCTTGGTAGCTCAGACGTTCGCTTTCGATGCTTATCCTGTCTCATCCCCATTCGCTCGCGGCCGCTACGCCGCGATAATCCATCCTTAATAGGGGTTCATACCTATTCCTCATTCAGGTTCTGGTAACG
ATACAGAATAATTTACCTGGGATTTGGCCGCGTATAATTTGACGGTTAGATAGGAACTCCCTTGCGCGCTACACGGGTTTACTACCAATTAGCTTAAATAGGATAAGTCCTGGGCCATGAAATGACGTTTAGCTTTCGAGTCTACGTGGGTATCCCTATGCTTGTTCAGTGATACGCGATACCCAGCCCGCCGTCTGACT
GGTCCTATAATGGCATCGTCAACCGAGCGTTCAGACTGATGGCGTCGATCAGTGTTAGATATAACTCGCGGCCATGCCTCCCCTCACAGTCGATTAGAGTACGGAGCGTATAGTCGCGTATTACCCCACCCACCTCACTTATAGAAAATACTGTCCCACGATAGCAAAGAAGGAGGCGTAAGGAGTCGCGCTTTCCTAGG
AACCTTACACCCGTACTCGGAGCGCACTATTCTCCCGCCCTGACCTGCATAGCTAGCCCTACTAATTCTACATAATGTGCAATCCCTAGGCATCGCTTCGGAGACCAATTGCGATATCTTGCGACCGTGTGGTTTCATAGATGATGATCCCACTTCTGGCCCATGATTGTGGGTCTGGGTGGAACGTCCACCTCAAAACC
CACGGTACGGCTCAACTGACCATGCTCGGCTGTTCCTCTATAAATCACGTGAGTGCGGGTCCAGGATGGGCTAAGAACATACATGATTCAACTGGCAACAGAAAGCGATTCTAGGCTTCATCTCATACTCGTGGCATCCTACAGTTGGGCGCCGCGTCAGTGGTGTCCGAGAACACCATTGCTAGACGCACTGAAAGATC
CATAGGTTGAGTTTTTATGTTCTGGGAACCGATAGAACGACTTGACGCATTCTATCCGACCTCAGCATTCGTTCTAATAAGAATGTGACACTTCCGGCGCCTGGATTTCCTTAAAAACACATCATTCCCTCGGATACAATGATCTGGCCAGTGCGGCAAACCACCCCTGGATCGTTCCAGACTACCGCTGCATTCACCTG
TTAATTCCTTGAGCTCATATCTACACCCAAAGTTTCGCAAGCGTCTGGTCCGGAGTAAGAACAGAGTGTCATTAACACTAACTACGTTAGTTCTGGGCAACCATCGGAACTCGCCCGTGCGACTGTGTTGTCCTCCGGCTGACCAGCCACCCTCTACTTTCCTAAACTCTTAGGGACGGTGGTTGAAACCAGCTTATCAT
GAGCCCACGCGGGATGGAGTATAGTGTCGCGTTTAAACACCGCGCCACCTGATTTAACAAAGATAAGGCTCTTTCGTATGGATAATCCTTACTTTTCTAGCGTATTCTTTGTAGCTGACCTGGATCTAGAGAGACTACAAGTGTTATGGTCGTCGGACATCCGGACGACTCACTGGAAGAGAAAGAGTTCTACAGACTAC
TAGTAACGGTGGTACGTCATGCGACGCGCTATGGGACCCTATTGTGAACCTGGCTCTTTCATGCCTGCAAACTATGTGCCCAACGGCACTTACGCTTTAGAGGGTCGCGATATAATGAATTGCTAAATGGGCTAGAAACCGACTGCCAAACCCTGCTGATCGAAGCTAGGTCATACGTTAAAGGGAGTGTGTATCGGAGT
ATTCTAGAGCGTTTCAGTGCTTTTCTCTCGTATTCCTGAAAACTTATCCGGTTGGTAAGTTACCAATTTGTTGGTCCCAAACCGTTGCCTCTATATCCGACCGTGATCGCCTAGCGCGGATTCAACTCTGTTGAAACGCTGATCACCCACATAGCACCTTCTAGCTCGGTGTTTTTGGCTTGACACAACGGTGGTTACCC
TTAACAGGGTTCGCGGCTGTATCCATCGTATCCCGTGAAATAGGATTCAGTCACGCGGTTCAAGGGCTATGTCGATCCGACGGAGTAACATATGTCTAAAGGATCCAACAATGATGATGGACAATCGTGTTGCTCACTAAATCTGTTGGCCGAAGCCTGGGTCTAGCAAAATAGACCGGCAGAACTTCCTAAATGCATTA
ATGTTTCACTTAGAATTAATAATTTTGATTCGCAAAGGGGAGAGATCGTGGTTACTTCACGTACTTTTTATGGAGTCCCTTCAGTACTCCCTACTCCGGTGATGCAGTCTATCCAAGTCATGCGCGATGCCCTATGCCGAGTTCATCCTACCACTCATTTCGTCTAATCCCTTAAATATACACCGGTATTCTCTCTGGTC
ATCAGTGCTACAATACGTGGGGGGCTCGATTACCGGTCGCAGTACTATCGGGGCTGGGTTTTTGATTTACCTGTTCAAATCTAGCGTTTTGGCATTGAGCCATCCCCGGCGATCTTACGAAAACCTGGGTCCCTCGGCCACTCGTCCCGTGGGCACGATCAATAACGATGAAAAACTTCAATCGACATTACTAAAAGGTG
ATATACTAGACTGGGCCCTTGTACCAAGTGGCATGTGGATTAGCGGAGTAATTCGTCAAGTTGAAGTATCCAGTTACCGTTAATTGCCCTCATACGAACTCCTAGTCACCTCCGAGGTAGGGGGTCCAAATTTCCAGCTGCAAACGCGTCCGTCGCGGTGATAGTCTTCTACACCTGGTATTTACTCATAAGAGCCACTA
TCCGCGGTACTGGCCTAGCCGGCCAGTTACGCCTCTCGCCGTCACACGTCAAGCTGTTATAGACCAGAACAGCAGCTTTCGAATAACTAGTAGATCAACACGGGTCATCACAATTACACTGCCAAGTGAGAGCACTCTGAACATATGCCTTCTGTTGCAGATAAGCCGTTATCGTTGAAAACCTTCTGTGGGCTTTGGAG
TTGCAGACTCAAGGACTACCCGCTCATCCAGGCTCGGCCGAGTTCGGACCCCGCTATGTGAGTTCTGGAGTTAAACAAAGGATTGAAGGGTGTCCTTGTTAATAGCAAAGCACCGATTTAGTAAAAGCCATCTTCGCGGACGATCAGACGAGCTAGCGCTATTGTTCGAGAGACCACCAGATCCGCTGAAGTATCAAATC
TGCAGTACACCAAAGAGTACCAGCTACGATGAGGGTCTGCCCAAAAGATCGATGTACACATCAATGCCCAACGGTGTTTCCTTCTGGCTTATACATAGGTATAAATATAGGTCATAGTCAAGTGCAGATGAACACCTGTGTCAAATGGTGAGTTAACGGAGCTTCTACTTGCTAGGCGCCATTTCCAAGAAGCATCGGTG
TTGACTGAGTTGCGCGCTTTCTTCCATGTTTTCTTTGTTACTTCGTCCTCCGTCGTTCACACTCCCAGATGAAACTACCCTATTCAAACTTATATAGGAATCCGGCAGATGTGTAAAAGTGGTCCAAGGTATCTGTGGAACTTGTACTGCCGGGTTTAGTGAAGACTTCGAATCCAGGCAATTAGATACCGTGCGGAATT
ACAAGCAGAAATTCCGTGGAGCCGCAATAATGCTTCTCAACGACTACTTGCCTAACCGCGATGATCAGTTTATCATAATCAACTCGATGTATCCGTTAACATAGGAGGATTCGATATTTGAACGCGGAAGTCTTATCATGCTCACCACCTCGTCCCGGCAGTGTCCTTGAGAGGGGGCCGTCCTCCTACCTTCTCTAGTA
AATAACCAGTAAAGACGTCGTCGGCAGGTAGGGCCCCAGGGCACTCGCTCTCGTGAGTCCACTAGGTTCCGTGAGAAGGTTAGGTTCTGCAAAGCCCTAACAACCCCCGGAATACTTTATGGGCTTGGTTCATAGACGAATTGGACGCCCGCAGGCCTTGCGGGCCTACGCTGCTGTTTTGGGGTCCGAATGCGAGAGTA
TCTGGGGAGACGCCCTATAATCCAAGCGTATCAATGTTGCTACTGGGTAGCCATAAAACGTTGGTAGACGCAAACTAGTTCAACTTACAGTTCATAGAAAATACTGAGACGCTTGTTTTGTTCAGCAGTTGCGCGAATTAGACGCAATCATTCGTTCAAGGTACGGTCCAGGACGCTGCGGGCGAGGTGGAACTAGTGTC
GTAGCACCAGGAATATAAATAATATAGTTTAGCTACCCACCTTAGGTACCAACATCCCTTCCTCTGCGACTCGTTCGCACTCTTTCTAAGACGAAAATCCATAATGGCCAGGGAACATACATGATGACCAGCCTCATTTACGTAGGTCGCTTGGGGAGAAATGAGGGTGGTGCCCGCTCTCTCCGAGTAGCCTCGTTCTC
ATATCATGAAACCTAGCGCTAAGCGGACGAAAAGAAGTTGCCCATCCGACTCGGAATAGGCTCATCACTGCTTCCTAGGCTCCAGCCGACTGACGATTCCCTCAGGAAACCTCCCTCGCCCTTGAGGTAGGATCGTAGACTTTACCCTCACGAAACAGTCGCTCATATAGCATCCATCGGCGCGGCCCTGCAGAGGTACT
GCTGGGGCAGCTCTCGTGAACAGTTATTGAAGCGCGGATGTAAACAGCGATGCTGAGCAAAATACTAACAGGTAGGCCGAGCCTGATCCTGTTAACTTTCGATCACATTCGGGCAACACGTACGTAGCCCTCCATGGCAAAGTCAGTTTTGCAAGCACGCCATTCACTCTGATTCGATTATTACAGTGGCAAGACGAGGC
CGTACGAGGTGAAGGTATAGCCTTTTGTCAGCCCCCTTAACATAGTGGCCTAATGCATGCAAACTTGCAGGCCCGCACCCAAAGAGATCAAAAGTTACAACCCAAAATTGAGTTGGTTATGGATCAGGGCCATAAGGTAGCTATAGGTTACCAACAGCAGGACCTGGGTAAGTCTGCAGCATTTTAACTGATAAGACTAA
AGCCGACTTGCGTTCCCATAGTAGTCTATTATTAATTCCGATAATGGTTGACGAAATGCTGCGCCGGGTGGATTAGCGGAACGCACTCTCAAGCGTTAAGCCTTGTGGCGGTGCTCACGGCACATATTATGCCTGCCATTAAACGTCCTGTAGGCAGAGGATGGGAGATTACACGGATGAGGTGAACATCGGGAGTATGA
TAAGTCCCAGCTTCCCTCTCCTTGTAAAATCAGGGAGCCTCCTTACGCTCCCGTTCTCTTTTTAAAGGATGAATCCCTTGTTTTCGTTGTTTTGTCAGAAACACTCGCGAAGCTGGAATTTCAGGGTAATTCTCGCGCCATACTCAACCCACGGATGCCTTCGCTATCGGAAAGCGTTTCTACTACGCCACGTACGGCCG
GCTAATTTCCCACTATCGTCCCCGCCCCTTGGATGGCTGATCAACTTAGCTCGTCCGGGCAGTGTAATTTTCACGCTGCCAAGGCACACTTATAGTTCTTAAAACCCGGAGAGTGCTACAAACTCCGCTGACACGGCTTTTTACTATAACCTAAGACCGAGGTTCCGACAAGTTCCGTTTCCGGCACCAAGCCCTTCAAA
TGAGTGCAAAGGGATTACGATATGTTAGTGGACTTGGTCACTCACTGTCCATGGCCCAATATACTAATTCAATAAGAAGACGTCACGGCATAAGCGACGCATCTCGTTCCACTCGCGGCATGCCAACACAAGTATCATGAACTCATCAGGCAAAGTAGAGATCTAAACTCGGTGATAGAGAGAAGTCATAATTCCGCAAT
GAGCGTGGAATTGCGCAGGAGGGTTATCTATGTTATCCGGACACCACAGAATTCGGCGCCCTCGCAGACCTTTTCATATTGCTCCCTTCCCTAGATACCTACCCCTCCTACCTCTAGACTATGAAGTGGCGACACCTCAAACGGGTTACTCGGGAACCGCGCCAGCCCCTTGTTGCCCTGTAGGCTAGCTCCGATCATAT
GCGTCAGGACAGACGGCGGGGACACCACAGACGTGCACTAGACGTGAGGCGCGGCCTCCCCAATAATTCTGAGACTATAAGTAGCTCCCTTTTAGAATGGAAGCTGGCCTACCGTATAAATATCGAAGATCGAACGCTCGTACCGTGTGTAACCTACTACGTCTAAGCTGTGAGGACAACTAGTGATTAGCCCAGCGCGT
GTCTTACGGCTTCGACTCGTACCAATTGGCATTCACCCGTATCAGTAGACGCTAGAGCATGGCTTAGGTATGCAAGTGGTACAACAGTCCTTAATGTCGAGTGGGCCTTAAACTGCTCGCACTACATCGGGGGGTCTGTGTAGCGTACACACACGGTCATTGGCAGAGCAGATAAGCATTTGGTTGCCTGCCTGAATTGT
CAGAGGAGTGACGGAGCTGACCCGTATCGTTATAAAGTAGATTCAAACGACGCCTTTTAGAATCCACAAATTGGTAACCTTTGTTGCACCGAATTGAGAGCGCTATCGTCATCAGACTTCTTCTTAAGGATTTTAGCGAACCTGACCCGATGGGGTCCCCAGCGAGAGCAGCGGCGGCTTCGCTGCAGACCACCCACTTA
GGTCCCTGGGCAGCAGTTCGGGGTAGAGGCTCGGTGAGCGGACTTGGCCACCGTAGGTCAGGACTTGGGTCACTCTCACCCGCACACAGGGCTGATGCATCGTCGCTTCTGGCGTAAAGAATACTCGAAAAGGCACGTTACTCCCTGCTCTACTCACTCAAAAGGGATCTAGTGGAGTCGTGAGCAGCCGGGACGGGACA
TCCGGTCAGCCCGAGGGGTGGATGAGGGGTTAGATACGGATAACTACGTTGTGCGGAAAATTAGCGTGATCCCCAGCACCGTTAGTACGTATGTCCGCCTTTTGAAACCAATGTCCTACCAACTGAGCGCCCTGATGGCCATGCCAGCTCTAAGACCCGCAGTAGTTAGGTGCAAAATGAGTCTTCTGCCTACGTGGATG
TAAACCACGCTCATTCCTAGTATCCTTCAAGTACGTGTCAGTGAGACAAAAAACTACTATAAATGACCACGCGCGGCAAAATTCAGGCAGTCGGTAGTAGCCCCACACAATCGCTCCACCATATCACGCCTATCGGTAGGTAAACAGTCTAGCACATGTTATAGTTAGTTAACCTAATTTAAGATGGATACTAGTGGTGC
ACGTTGCTAAGGTACAAGGGGGTTACACACGAGCAAATCTGGATTGGGTCATAACAGTGGGTACTGCATGGAAATTGTACGCACCCCCAGCCAATGGAGGAGGCGCCGGATGAGTCGACGGGGGCGGGCTCATTTACTTCAATATCAATTGACCTCAGTTAGCCCCTTCCCTCCTACCCAGGCGTATCAGCAGGACCGTA
CGGGAAGTATATCACTTGCCTCACGAGTTGAAAAGGATATCTTCCCCCAACGCACTCGGTAAGCGGATGTATTAATCTATCTTCGCTTTTGGGACTTATCCGTAAAATAGCTCGAACGGAGCGTTTGACCTCGTACTCACCATATGCCTAACGTGAATTACAACCTACAGGGCACTTACAACAAGCCTGGCCGATCTCAT
GAGTCAGTCCGGGGAGTTCCATTCACGTTGACTACGCAGATGAAATCATAAGTCAAGCGTGAGTTGTTTCTCCCGCTCCATGCATGTACGTCCGGTGCGTCAACAACTAATGTAGTTCTGTTTTCTCGACGTGTATGTACGGTAATAATTTATAGAAGGGACAGGAGGGTGTTAGCGCCGGCGCGAAACTAGAACAAAAG
TCTGCGTATTTAGGACTTGAGCTCTCATCGCGTTCCGGCCTCATGATGAATATTCGGCCGGCCAACCCGATCGGGCTCTGATACAGCGGCCCGTGAGGTTTGGCTGGGTGAGGTGGCCTAATGATTGACGATGAGGTCATAGCCCTCTGTGGGAATGGTCTCCTCAGCAAACAGGATGCTGTAATCGGAGGACTAGGAGG
TACAACCGTGGCGTCGTATTCATGTTACATACAAGTGGGCCTTAACGCCAGGCTGTAGCACACTCGTTTGACGTGCGTTGCGGTCTAGGATGGCTGAGCTGTCGAGGACCAGAGTCGACGCGGCGTGACTTGATTGCACTCTCAACGATATTCCCAGGCCTCTGGGGCGAAAGCGCATCTGTTCAGGAAAGACACCATAG
CCGGGCGGTGAGCCTGCGCACCAAAGGCTTCGCCCGCGTGGGTGGACCCATCGCAGAGTCCCTGCGTTATAAGCAATGTGTGTAACGTTTCCCATAATTAAGTCAGTGACTGGGTTTACGACCAATGGAACCCTGAAGATAATACAACTGGTGCAAGCATATATACGGGATGTCACGCGAACCTCCTATCGAGTACGGTA
GCAGCCGCCAGAGAAGGATTACAGTGTCGGGTACCTGCGGAGTACATCTCAACTAGTAACTCCGGACATAGCAGTTCACCGAATATCATCCTGAAGTCGGCAGCTTCGAAGCCCTCCGGTGGCTGCAGTGGGGAGGTTCACTGGGGACACGCGACTGGGCAACTCGCATGTCACATGCATCCTCGGACCACGCTCTCCCG
GGCATCCGCTTCTGCGTCCAGAGTTTGTTCTGAAATTGACAGCCAAGGGCTCAGGAGAGTCCGCATACACCGTATTGGTACCGCGAAAGATTATGATGCCAGAAGACGTTCACCAAAAGAATGCTTACTGCTACGGACCACGGGATGAGGCGGGTGACACGTTTCTCGAGAACTGGAACATGCTGGGCCTACATAGTATC
GCCGATTAAGCAGTGAATGCCATCTAACTGTTATTGGCCGACCTTCCTTTGTAATAATTTCATACCAGCGACAAGGGTAGGCGGCCGTCTCTTCGCGCCCAGTAGGTCTAATTCGCGGGCCCAGCACAACAAGAGTAAACGTCGAAATAACACGTAACTGTCGGCACGTGCGCGGCGTGCAGTAGGAGAATCGCTCAGAT
GCCTGGGCAGGCGGATCGATGGCACGCACTCAGTTACACTAATAATTTATCTACCTAACTCGCCGGCATTCATTGTGCCCTTTTAGCGGTACACATGCGAAGGTATTACAAACACAGTACCACTTCGGGAACGGTGTACCTAAAAACGCGGGCCACGGCCCCTTGTATCATAAACTCCACTCTTTGTTTCAGGTCTCCCT
AACGATCTTTCCCTATGAGTCTTACAGCAGACCGGCCTGTCCGTTTAGACCGCATGATAATTTTACGAAAGCGGCGCCAGGAGCACAACTACCCACCGATCGGAGATTGAGTCTTACTGCATGCCGGGTTGCATCTCTGGCAGTCTACTATTGTCGCAGGTCCGTTTCCACGTAATTATATACCATATCGGTTAGAGCCA
CTGTGCTCCAACCCTTGAAGTCCTACTTCCGTGCGACCACTCACGTCTTGGTGAGTACATTGAAGCATAGCCATATCCGTTGGCGGGACGTCCCTCGTCACGAGCGGCGATCGTCGACTCACCTTGACCTCTTGTACTGCTCGCGATCGCACTCCTCGTGCCATTCCAACGGGGTCCTTACCTAGCTAACAAGTAGATTT
TCTATCCCACACGTACTGGTAGCATGTGAGCATAGTCTGATAATAAGAGTCGGGCACTGATTCAGGCCAAGAGGAATCATATTGGTAGGGGAGGTCATCATTTCCTTTCCTGCGCATAAGCCGGCGTACTACTTCCCTTTCCGGGTTCGTTATAGGCACAATAACAGAATGTTCAAAATTGTTGTCAGGCATTTGGATGT
TCTCATTTGCTGGTTGTACAAAAGACCACACAAACCAGTACGATAACACCACCCCCCGTAATCACGCCGCTCGGCGAAGTAGCATCTGTTACATCGGGTCAGGAGTCGATGCACTTCCTGGACGGTTATGATTGCGGATATGGCATGTGAGCTATAACTCTTACGTTCAACTTGGAGGTCTCGTACGCGTGAGTTGCGTG
TGACAGTTATCCTTGGCAACGATCTCTGTCTTGTAATTGGGTGTGGCCAAACTTACAATCACTTACTTACGAACATCCTAGCACGACGTCGGGAAGCATTGGTTGGGCTACGTTCAGGGTTGCCTGACATTTATAGTTGTTAAGACCTGTCGCTAAGTGGTATAGTTGGCGTCGTATCTGAGATCGAAATCGGGCACCGC
CACCAATGTCGACCATGACGGAGCACGTCCGCTGGACCCAACCTTGATTGCAAGCTTTACGCATGGACTCTCCAAGTCCAGAAACGTACAACACGACATCTGAGTCGCAATATATCTAGTAAATAAAATGTCACGCGTACAATATGTTGTCTCGCACGAACCAAGCGTAGCCAAGACCAAGTGCTATCGATTCAATCCTC
AATTATTCAATGATGATTCACACAAGCTCTAGCAGTGATCCAGACGGAGCAACACCGATTAAGTGGGACTAATGCCCGCGGCATTCGCCCACATCGGTAGTGGACTTTCGGGTGCCGGTTATGTACGCCCCTCCGACGTCAACCCGGCAACCACTCGTATTCATGGTGTGGCGATACAACCTTGGGTTTTTCCGATTTAT
GGTCCTCCGTGCGTACACATCTATCGACCGCCCGCTCTAAGTTAAACCACCGCTCGCCAACAGGTTGTAATTATATTCGGTATGGTGTCAGCAAAATAGGTATTACGATTCGCATGATTATCGTCATTGTGGGCCTCACCCAGAGCTCCTAAGTTCTGCAGCAGAATTCCTCCCCAATAGTTGTTCCACACCGGGGAGGT
GACACTTTTGTCGGAATGAATATGCTTGATGATGCCGGGTAGGTTTTAGAACCGACAACATGATTGAGTAAGAAAGTGGACGGCGGGGGCGCGACCCCAGGGCCGCAAAGCATGTGGCACAGCTAGCTAGCTTACCGATAAGTACTTTCCGGGGGCCCTACAATGTAGAAGAGAGGGAGCGAACCCCCTCAACACCCTCC
CGTGACAATGATGTAACCGTCACGGACCCCTACTCTGCACATCGTTGCTCGCGTCTAACTGTATTCGGAGAACAGTATTGACGGAGTGGACCACTGGGTAAAGTCTTGTAGCATGAATGGTCAAGCGCGAGGGAATGAGTGTCCAGCAACCAAGCACCAACTACAGACAGGATTGAGTAACGAGCATGAAAGGAAAAATG
TAGCGTTGAACTGGGCCCGTAACACCCTCTAGCGCAGGTAGTAAACGCGGGAAATATCTGACCAAGATGTCATCGACGGAAGACTTATGAAGTATTGATCGGCAGAGTGGGAATGAGAGAATACAGCACTGTAGGTAAACCAGCAGGTAATTTCCATGACTTGCGTTAACGAACCGCGTGTGAGTAACAAACTGCTACTA
AAGAAGGAATCTAAGGCTTTACTTTCGGTATTCATACGCGATCCCGTCCTCCGGGTTGGTAGGGCAGAGAATGCCGGGATCAAGTGGATAATTATGTTTCTTTGGTCTATACCTCCTACCTTGGCGAGCTATGTGCGCACATTTCCATTCATCAGGGTGATGTAGCCAGCAACCACCGCGTTAAATATTGTGATCCCTGC
AATCAGGCAATAGCATTCCAAAACACATCTCAGGACCAGTTCTCCTCGCGTGCGCAATCTAGTCCGCCCGCGCGGGTTGCGCAGCTCTTCAATGCTGAAAGGCAGTGATCAGACCGACGGAGAGCGAGGGTACTTGGGTGCAGATCGTTATGGATTCCTAGATTAGTCCCGCGCCTAGTGTGGAAGTCCTAGCGATCGAC
GTCCCTGCCCTCGGCGGCCCTGCCGTCCGACTATTTCTCACACAACTTCCAGATACCCGTATTCTCATCGAACGGTATATTTGAAAGTTTCTGTATGCAATCCATTAAATGAGCCCTAAGGGCAATGCCGCCAACTACATACCAGGACAACATTCTATTTTGACTTGTTTTGAATACTCTCGACGCCCCGTATGGAGGTT
GTTCTGTTAGTCACCAGTACCCTTTCCTCAGAGCCTCGAAGAAGTTTATGGGAGGCGTAACTAAATATGCTTCGATAAAGATCGTCAGTGACTTTCCTTACTATTATACGAATTTTGCATCCTAACATTACCGCGACACGACTAATACCGTCTCGTATCGCGAGCGCCAATACCATCTTAAGGAGGCTGTTCAAAAGGAC
CGAAAGGGCCTCAATCCTAACCCAGCATATAATAGGCTATCCATAAACAGAAAACTGTCCCCGACGCACTACAAGTTCGGTTATCAAGATAATGCTGCCACGAGGTCGATGGTCTTAAAGCTCGTTGGGCTCTGTTTGCCGCAGGCTCCTTTTACCAAAGTAATTGTCTAATCTGTCGGCGTACTTGAACGTATTATCTC
CCCTAACACATTCGTGGTAGAGCCTGAAATAATAACCATAATACGACTTTTACATATTATTAATTTGCCACCCGTCAATATTCCTCCTAGGCCGGAACGTTGAAACTCTAAATTAGCAACCCTGCTAAAGGACGTACATTGTAAATCCTCCACACCCCGTTAACAATTGTGACTAACCACCGCTTAGGAGACTTCACCCA
GTCGAAGGTCCAATGAGTAAGATCTGATATGTATAACTCGCATCTCAAGCCGCCAAGTTATCTGCTCGAAAGCGAAAGTCCGACGTAAAAATCAACGCACAATATTTTCTGATAATGTAGTCTATTGTCTCATCGATAGCGCAACATCTTCCACAACTGGGCTCATGAAACAACCATTTGCGCAGTAAATGAGCTAGACG
TAGCCAGCTTTTGCCCGCGCGGTCGGGCGAAGTGATATGGGTCATGTTTGGGCAACCCAGCGGGGTAAACGTGGACCAATGTTACTTATTATGACGCCCTGCTCAAAGGTACCCCCATCCACTGTTGGGTGCTTTGTTGGATTGGATTACCTACAGCTTATTATAGCTCTAGCTGAACGGACACGTAAAACACCTTGGTA
TAACACCGAAGGGGCTCATCGAACATCAGGGGGGAAACGCCATCTTCTGGATAATTGCGTATCGGTACACCGACGTCCCATCGCCATCCAGCCGCAGACCTCAATTGTGAGACGAGTGAGCATAATTTGTACGAGGCGCCTTCCGACTGGTGCTTGTACCGTCAACTTTTGAAAGTTACAGTTGCTTTAATCCCGCGTAT
AGCCCCCTCGGCGCTTCGAAGAAATAGGTGTGGCTGTCCTGTACACCTGGGTTTAACCGGGACGACGAATACTGCCGTTACGAGTTCGATATGGAAGTAGCATCTCAGGAAAGTAAAGACGTATCAGTATGGTGATCGTGCACCTACCACAATCCCTGACCCCGACTTTTAGCAGGGTAAGCGAGAAGCTACACAACGAA
GCACTACAAAAGTGCAACAGACCGCATCTAGCATGCGCAAGTTACTCTGCTCACGTATATTCGCCAGAAAAGGGTGTTGAGGATTGCCACTATTGATAGGCTGCGTTAGGAGAAGCGTTTTCCATGCCTGAGTCGCATGGCTGTCCTCCGCAGTTGAAAAACACCGTCATCAATCATCCTCGTGACGTAGCTGGGGGTTT
AAGAGTGAATGCTGCTGGCCCGTCAACGAATATTCCTTCCGTCTGGTTGCCGTTGCGTATAGTGGGGTCACGACTCCGAATATGATGAGTGAGCAACAGTCATATCGTCAAGTATCGCCCCACCTATGTTACCCAGCGGTATACCGACTGCGTTTTCGATATGTCATATAAATTATTCAGTGAGCTAACCCTCAATACGG
GGCCCGTTTGTATATACAACGTATGGCAACTCTACTGGCAATCGAATGGTTACACTATTAGTTCAGTACTGTCTTCCCGATGGTCATACAAGAACGTGCCTTCTAGCGGATGATTGACATTACGCTTACGGACTTTCTCTCCCGATCGCGGGCTAGTCGGGCCATGGCTTATTTGGGCGGATTTCTTTCCAACAGTACTA
GTGGTCTTTCCTAGTTCAAGGAGTACCAAACCGAAGGGCTGTCATACAGGAGATGTAATTTACTTTTACGAAAACCTCAGCACGAGCGATGACCCTCATTGACTTAATAACCTCCACTGAGGTGATGGTTCTGGGTCCTGACGTTTACCAAACAACTCCTGAACTTAAGATACTTGAACTGTTACTTAAAATTTGTCCTA
TCAATGTACCCGGGCAAACAACTATGTGAACAAGTATCCCCACCGCTGTCGCCACAAACGAGTGTACTGCTGGCAATTCCGCTGTCGTTATAATAGCTCGTGAGCCATAATCGCTCAGTGCTCCTTACAGTTAGTTTGCGTCTACTGGTTGGAGGGCTTCCGTTGCTATCATCTGCACTTACCAGGCGGTACATTGCTGC
TGTTTAGCGAGCCTCACCGCGTGCCTGCGAATGCCCTAGAGAACAACCCCGCATCTCGCCGGCCTGTTGCACGCACATCTCACTCCTGTCCAATCGGGGTAATCCTTGCGGGCTCAGAGCACCACTCTGGTAAGATTTAAGCCGTTACTGGAGGAGAGTTAACTTGCGTCTAGGGTTAACGCCCGGTCCGGTCATCCATA
GGGCCCTTGGCCAAAACGAAAGTGAGATAGGGAAAATCCAGCACTGTACTAGACTTCAACGCTTTGTAGACCAGTCTTAGTCGCCAAATTTACGGAGGAATTGACCAGGGTTAAATGTAGTTGTGTGGACCTAGCCCAGATGAGGGGTCGGAAGCGTAGTACCGCATCGCAATCTTTGGCGGTTCAGATACTCCGTAACA
GGTCGGCTGAGAAGTAGGCCTCGCGGTTGTACGGCTAGATCGGGTCGTAGCCCGCCACGCTCGTGCCTAGCGCTGCGATGGACCACAGTGAGCGTAATCAATCGAGGGTGAAACAAGCGGTCTTAATCCACAGATATTGCACCTCTCTGGAGACCTCTATGTGTTAGAACGTGTGTACTTAGGAAAACGAAAAAACATAA
CCTATGGTATTACACGCTAAATCGAGCAAGGGACACCGAAGTGAGACTGGGTACAGTCTTAGGGTAGAGGTAAGCAGATGGAATCCGCTTCCAGGCGCACACCGACCTCAGAGTCCGACGAAAAGCGTGGTTGGAATAGGTCGATGTGGGATCTACGATGGGGTAGGAACTGGACCGCCAAAAACGTGATGCACGTCGTA
TCACGGATTAGCTTATGATATGTGGCCACCAAGGTAGGATCATGATGCTGAGAAGGGAGGGAGCCGATAAAAATTCCCTGGGCCGATTAGGGCTAGCTCCTCGTGGCGTGTAAATATGTACATAGGCAAGCCCCCGGTATGGGCGAGGCTACGGGTTTAGTTTGGCGAAGCCTATTGTGACCGTTCCTATGATGCAGACC
GGGCGTTAGGGAGTTCGGTGGAAAGGGGGTTTAACACTGCTGCACAGGTGTGGCCGACCTCATGATGATATCGTATCCGCAACGATTAGGATCATGCTGCGAACGAGCCACAAAGGTTTTTAAAGTAAGTTGGAGTAGTGTGGTCTAATACCATACACGGGGGTCGTTCAAGCACCGGTGGGATACCGATTTCTAGATAG
TTTAAGAATTTCTCGGCGGATCGTGGCAACAGTGATACTGCGTCACAGCGATTAACACACATGACACTTACAGCGTCCAAATGTCACCCGGAGTTCGTAAACCTTGGAGAGCGGTTGTCTGAAGGGGTCAAAACGTCAAACCCAATGTTCCGTATGATAAGGACGGAGCGAGACCCAGGGATCCTGTCCTTCCAGAAATA
GCAGGTTAATATCTATATTTAGCATTCCCGATCCTATATCTGGACGGCAGCGTCGACTCATCTAGCCATATCCGTGTCATAGAGATTGCCTTGTTGTTCTCCTTGCTAGGGGAAAGTGTCGAACTTCACGGCCTGGATTACATCCGAAGTGTGGAGATAAATATCGAGTTCTGCTGACTCTCAAATGAAACAACTTAACT
I asked a question about "how to handle two files
simultaneously" several days ago and a couple of
people gave me very useful tips. Thanks again!
Bascially, "short_sequence.txt" file holds 100 short
sequences (8 nucleotide long)and "long_sequence.txt"
file holds 100 long sequences (200 nucleotide long).
Every short sequence will be used to replace a substring
with the same width in the corresponding long sequences.
Then it will return a replaced file which holds 100 new
long sequences.
I have been testing my code and Duncdude's code for that
job. However, when I find is the substring in the replaced
txt file is not the same as the short sequences in the "short_sequences.txt".
What are your suggestions?
Thanks!
My code is :
CODE
#!/usr/bin/perl
use strict;
use warnings;
my (@short, @long, $offset); # the 'short' array will hold the short
#sequences while 'long' array the long sequences
open(SHORT, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n";
while(<SHORT>){
chomp $_;
chop $_;
push(@short, $_);
}
close SHORT; #Close the file
open(LONG, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n";
while(<LONG>){
chomp $_;
chop $_;
push(@long, $_);
}
close LONG; #Close the file
# replacement
for(my $i = 0; $i <= $#short; $i++){
$offset = int(rand(193));
print $offset."\n";
#print length($short[$i]);
substr($long[$i],$offset,length($short[$i]),$short[$i]);
print "\n", $long[$i], "\n";
}
Duncdude's code is:
CODE
#!/usr/bin/perl
open (SHORT, "< short.txt");
chomp (@short = <SHORT>);
close SHORT;
open (LONG, "< long.txt");
chomp (@long = <LONG>);
close LONG;
open (OUT_HTML, "> output.html");
print OUT_HTML "<pre>";
open (OUT_NORM, "> output.txt");
for ($x=0; $x<=$#short; $x++) {
$r=int(rand(length ($long[$x]) - length ($short[$x]) + 1));
print "### $r ###\n";
print "$long[$x]\n";
# this section is for visual purposes only
$output_norm = substr($long[$x], $r, length $short[$x]);
print " " x $r;
print "$output_norm\n";
substr($long[$x], $r, length $short[$x]) = "<font color=red><b>$short[$x]</b></font>";
print OUT_HTML "$long[$x]\n";
$long[$x] =~ s/<[^>]+>//g;
print OUT_NORM "$long[$x]\n";
}
close OUT_HTML;
close OUT_NORM;
The "short_sequences.txt" file is:
CAAGACAA
ACAGTTCA
CCGAATTC
GATACTAA
GTTTCACC
TCATCTCA
CCGCCGAT
TATTGTCA
ACGTAACG
TGAGGCTT
CAAATTAC
TCGACGGG
GGTCAGGT
ATAAGGCA
ACGGGCTG
CTCCTCGT
GTAATCAT
GACAAGAA
AACCGTGA
GTCCACGG
AAGCCGGG
CTTGTCCT
GTTCTTAC
AATGCCGG
TAGCAGGA
GTCTTGAG
AGATCTGA
ATTTGATG
TGTTGGCG
CACTGGTT
AGTGAAGG
CTATGATC
GCGATCTA
AATCCAAT
TGATCGCA
TTTGAGAA
TGGTATCT
TCTGGACG
AACCCTGA
TTCCACCT
CGCTCCTG
GACTAGGG
CCAAGCAT
CAACGTTA
TCCTTAAA
TCCCGCAT
TCGCCATT
AAGCTCTC
GTACCGTT
CCAATCCT
CAATCGTC
AGTCGCTA
TACACGAG
TTCGGTAT
ACCAATCG
TGGTGATA
ACCTATTG
TTCCGTAA
CGGCATCT
AAACATGA
CTTATAAT
CGAAGTGT
CCCCCATG
GACTAGAC
CAAAGTTA
AAGTAGTC
TTACGTCA
GACTGGAT
AATATTGG
CTGGGTAT
CTTCAGGT
TCTGTTGA
AAGTGCGT
ACCTTCGT
TAGAGACG
ACAGTCGA
AACGTCTT
GGCATGAT
TGGTTTAT
TTAACTAG
GTAATGGC
CTTGTCAT
TCTGCACA
ACTCAGAA
CTGACATG
ATTCAACC
TAGTCAAC
GACACGGT
AAGACTGC
ACGAGGCC
GTACGGTC
AGCCAGTG
CGTTGATC
CTTCAAAT
ACGTTAGA
CCAACGTC
TTCCGTGC
TTCGGTTC
TCCAATAC
The "long_sequences.txt" file is:
GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGAGCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCTCACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTCGAACCTTGGACTAACCACTGTCTGGATA
AGGATTACCCGCTGGACTTCAAACGCTCGTGAAGCATCGTATTGCGAGGCAACCGAGTCATAGCCCAGTCCGGGGGCCAACGCAGTGCCAGCATCTGCGTTGTTCATCGGTCCTCAGTCTCCCATCAACGTGGTCCACACCTAGCATCCTGGTTTTGCATCCGTAACAAAGGACGTTCGAAGTTTTTTGCCGGCGGGAAG
TGATAATTGGTGCAATATTCTCCATAACAGATCCTCGCCAATACGGATTTGAGGGATCCCTCTGCATTTCCACGAAGCGTGTCACCGATAGAGCAGAAATGCTTTACCGCCGCAGTGATTAGGCGGGTACAGTTGTCCAAACGCACACAACCGAAACCTCCCCATGCGTACTCGTTCGTTTAGTCGCGTACAGAGGGAAC
TCTAGCTTTCGTGAAGTCCCAGGCTCCTAGCCCTACAGCGACTTCTGGCCGGATCTGAGGGAAACTTGTCTAAACTTTTTTCCCCGCGAGACTCACTGTTATAAGAGCTGGAGTTTAGATGGGCATGGCATTCTATTGCTAGTATGTCTGAGAGGTGTTCTTTAGACCACGGGTTGATGCGGGTTCGAGTAGAGACACAC
CAGCGGGCGATCACCGTTAACGATGGTGGGTAACGGTCGGAATCGGAATAGGAGAAGAATATGAGATCGTCCATGGGACGTTTCAGACGAGACGTCCCAGCTAGCCGCTGCAATATGGTATTTGGACTCATTCTTCGAGACAGTAAGCGTGATTTTTATCAGGCCCGAGGCTGTTAAACACAGCTGCTCTAGGCCGAATG
ATGCACTTCCCCACGAACTCCTGTTAGGATGTACGTAAGAGTTCAAAATGCATTCCAATTCGTTTGACGTGCAAATTAGAACACCGATGGATTGTCCAGCTTTGCACTCTACGTGAGGGGAACTGTGGTAGGTTTTTTAATGAAGCCTGCCCCTAAGTCCTTAGCCTTTGATTCTGTTTTCCCACCTCCGGCATCCAACC
GTGAGCAAGTATAGCGTTCGTCCTCCGGAGATGGACCCACAATTCTCTGCCGCATAGTCGTATGCCAATTAGTTTAGTTAAAGCGTGACTTCGGGGGTATTAAAAGAGTGGGTAGAAGTAGCAATATTATGTTATAAGTGGATGACGCAGGTGATCATGTGCCTGTCCTCCTTCGCGATTTGCATTGATGGTGACTGTTA
CGTTTTAGGCTCGTACTTGTTAAGGTATATGTTCCAAGTTCAGCCCCGCTGGGTCCGTGGATTGATACATTAGTTTGAGACGCCTAAATATTATGGGTTACCTCGTAGCGGATATTAGTGCCACCTTGCGGTACGCGCTGTTCGACACTGGACTAAACCCTGAAGCTTCTGACAGCGTAACAAAAAAGCAGATGAATCAC
CAAACACGATTGTGCTTCGTGACGAACGGGTGTAGCACGTACCGTACCATGTGCCCCTAGTTCTGAAACGTGTCATCTTCTAACGCCTCACTCACCATAATGTTCACCTCGTGCTGTACACGCAGTGGCCTAGCGCTAGGAGCTCAATTGAGTGCGACATGCTCTGGACAGCTGTAGCAGCTATGCATTAGGCACCCGCC
AGATACGTTAAAGGACCCGTGTCGGTTCGGGATTGCTATAGGTTGTCCAGCCATCGGCTCAACCCCTGGGGCGCACGATTTCATGCATGCACATCGGCCAGTACGTTTGGCAAGGGCTCAATAGACAACCAATACTGATCCCGCGCCTACATTTTCGCTATCGCAATTATCAACATAGAATTCCAGACTCGACTTGCCAT
GTTGTAGGAGGACGCTGTGCCTGATGCCGGCAATAATTTCAGCGAGGCACTAGGAGGAAGTCCGGAAAAAGCAGCTTACCCCCAAGCCTTCCCCACATATTACTATGTGTTTATTACGGAAGACTGTGCGCGGCTTACGCATCAAGCACGTTAAACGCGTACCCTAGGTTCCATCCTGTACAGTTGTTCTTATAGTAGCG
GAAACGCGTTCGCGCACGACGGGTTATAAGTGGCCAACAGCTCTATGCTAATGGTGCACCGTTGGGGTCCTCGAAAACCTGTGAACAGGCACTCCTATCCAGACTACCTAAACGTTGAGCCCGAGGTCCGCCCGTGCTAATTAAGCCGGCTATAGGTTTCCGCAAGACGGAGTGCTTGTGCTGGCAAACAGGGGTTTAAG
GCTCGTACCAACAGGGATCTCTCCTATAAAAGGGGAATCACTTCATCGCGATCCCATTGGCCACGGCACGCTAGGCTGTCAGTGATGCAGCATGTGTATTTTTCGTCGCCCGTAACTGACACAAGGCCACATGAGTGGAGGCCCGGTATATGCAATTATCGGCGGTCGGGCTGTCTCGTATATTTCAATAGACAGGGGAC
GCTCGTACCAACAGGGATCTCTCCTATAAAAGGGGAATCACTTCATCGCGATCCCATTGGCCACGGCACGCTAGGCTGTCAGTGATGCAGCATGTGTATTTTTCGTCGCCCGTAACTGACACAAGGCCACATGAGTGGAGGCCCGGTATATGCAATTATCGGCGGTCGGGCTGTCTCGTATATTTCAATAGACAGGGGAC
TTCCCTTATCCGAAGGATTCACACAGACGTGCCAGTATTACGGACGGGGGGGCGTCTCGACCCGTTGGGGAGCGCCTGTAGCTGGGGTGCGCGACACTCAAGTTTGAGTAACGCAGGTTCGTAGCTAGTATGTTCAAAGACGTTCGCTGCACTCTCCGTCCCCATCAAGTTCTAGTTTATTAGTGGATCCGTGTATTAGC
AACTCAACCGCCAACCGAGAAGCCTAAACTTTTGATCTAAAACACCGAATACGGCGCCTGTATTCTGAGGGCAAGCGATGCTTCTGTCTAAATGCGTGTGCCAATCGCATCTCCGTTCTACGGCCACCGATCGCGCAACTTCCAATTAGTTGGAATGGCGTATGGACAGAACGTATGGGTCCGTGGTAAAGAAATTGTTG
ATTTGTCCCGGTTACGTACTCCACTGCCCCGTCGTGGTGATTAGAGTCATCTGACGTCCCCCACTTCCTTATCATCATACACATACAACGAAAACACTCCCACCGGCAAGTAATTCGGCTCCTCTTGTTAAATAGCCTTGGGCCTAGGCATTTACTTCACGTGAGTACAGGTTTCCATCATGAGTAGGAGTAACCCCGTA
CTGCTGTGTTTTTGCTCTAGAAACGCTCAGGGAGAATTTAAGTCGTGATCCTAATTAAAGATAGTTGCACTCGAAGATAGTATAATCACTCGCTGTATTGCGCCTTTAGGCGCGCGTCCCTGTCCAACTCCGAAAGTTAGTCCTTCTCGGGAGTGGTTCCAAAAATAAGCGAACCGTCAAGTTACCTTGGTATATAACCG
ACGCGCTTGAAGGACCCTGTCAATGATGGTGTAAATAAAGTCGGTATAGCTTGCACTTTAAGGCGGTAGCCCTAAGGAAGGTTTAGGACAGTTGTAGGCAGGTGATACGACTTGTTGTTGCTCTCTTAACGGGCTTTGAGGGTAAACCCGCATCGATAAACCGACGAGGATCATCGCGCATAAGGCAGGCATACCTTGTG
TTCGACGAGTAAAGACGGACGCGAGGTTCGCGCCAGTTCGTCCAGATTACCATGCGATGGTTTATCGTTCATCTAAGCAGCTACCCTTGCCGATTATGGAAAGATGTCATAAGTAGTCGCCAAGTGTTTAATGCTGCGAATCGAATAAACTACGCTGATTCCGGTCTAGAGGCTAGGCATCAGGTATGAAGCACTTAGGA
GATGCTTATACTTTAATGCCTTTGTGGTTTCACAGGGTCGACCTGGCGTAACGCTATCGGTTATCGCGGTTTCTTGGTAGCTCAGACGTTCGCTTTCGATGCTTATCCTGTCTCATCCCCATTCGCTCGCGGCCGCTACGCCGCGATAATCCATCCTTAATAGGGGTTCATACCTATTCCTCATTCAGGTTCTGGTAACG
ATACAGAATAATTTACCTGGGATTTGGCCGCGTATAATTTGACGGTTAGATAGGAACTCCCTTGCGCGCTACACGGGTTTACTACCAATTAGCTTAAATAGGATAAGTCCTGGGCCATGAAATGACGTTTAGCTTTCGAGTCTACGTGGGTATCCCTATGCTTGTTCAGTGATACGCGATACCCAGCCCGCCGTCTGACT
GGTCCTATAATGGCATCGTCAACCGAGCGTTCAGACTGATGGCGTCGATCAGTGTTAGATATAACTCGCGGCCATGCCTCCCCTCACAGTCGATTAGAGTACGGAGCGTATAGTCGCGTATTACCCCACCCACCTCACTTATAGAAAATACTGTCCCACGATAGCAAAGAAGGAGGCGTAAGGAGTCGCGCTTTCCTAGG
AACCTTACACCCGTACTCGGAGCGCACTATTCTCCCGCCCTGACCTGCATAGCTAGCCCTACTAATTCTACATAATGTGCAATCCCTAGGCATCGCTTCGGAGACCAATTGCGATATCTTGCGACCGTGTGGTTTCATAGATGATGATCCCACTTCTGGCCCATGATTGTGGGTCTGGGTGGAACGTCCACCTCAAAACC
CACGGTACGGCTCAACTGACCATGCTCGGCTGTTCCTCTATAAATCACGTGAGTGCGGGTCCAGGATGGGCTAAGAACATACATGATTCAACTGGCAACAGAAAGCGATTCTAGGCTTCATCTCATACTCGTGGCATCCTACAGTTGGGCGCCGCGTCAGTGGTGTCCGAGAACACCATTGCTAGACGCACTGAAAGATC
CATAGGTTGAGTTTTTATGTTCTGGGAACCGATAGAACGACTTGACGCATTCTATCCGACCTCAGCATTCGTTCTAATAAGAATGTGACACTTCCGGCGCCTGGATTTCCTTAAAAACACATCATTCCCTCGGATACAATGATCTGGCCAGTGCGGCAAACCACCCCTGGATCGTTCCAGACTACCGCTGCATTCACCTG
TTAATTCCTTGAGCTCATATCTACACCCAAAGTTTCGCAAGCGTCTGGTCCGGAGTAAGAACAGAGTGTCATTAACACTAACTACGTTAGTTCTGGGCAACCATCGGAACTCGCCCGTGCGACTGTGTTGTCCTCCGGCTGACCAGCCACCCTCTACTTTCCTAAACTCTTAGGGACGGTGGTTGAAACCAGCTTATCAT
GAGCCCACGCGGGATGGAGTATAGTGTCGCGTTTAAACACCGCGCCACCTGATTTAACAAAGATAAGGCTCTTTCGTATGGATAATCCTTACTTTTCTAGCGTATTCTTTGTAGCTGACCTGGATCTAGAGAGACTACAAGTGTTATGGTCGTCGGACATCCGGACGACTCACTGGAAGAGAAAGAGTTCTACAGACTAC
TAGTAACGGTGGTACGTCATGCGACGCGCTATGGGACCCTATTGTGAACCTGGCTCTTTCATGCCTGCAAACTATGTGCCCAACGGCACTTACGCTTTAGAGGGTCGCGATATAATGAATTGCTAAATGGGCTAGAAACCGACTGCCAAACCCTGCTGATCGAAGCTAGGTCATACGTTAAAGGGAGTGTGTATCGGAGT
ATTCTAGAGCGTTTCAGTGCTTTTCTCTCGTATTCCTGAAAACTTATCCGGTTGGTAAGTTACCAATTTGTTGGTCCCAAACCGTTGCCTCTATATCCGACCGTGATCGCCTAGCGCGGATTCAACTCTGTTGAAACGCTGATCACCCACATAGCACCTTCTAGCTCGGTGTTTTTGGCTTGACACAACGGTGGTTACCC
TTAACAGGGTTCGCGGCTGTATCCATCGTATCCCGTGAAATAGGATTCAGTCACGCGGTTCAAGGGCTATGTCGATCCGACGGAGTAACATATGTCTAAAGGATCCAACAATGATGATGGACAATCGTGTTGCTCACTAAATCTGTTGGCCGAAGCCTGGGTCTAGCAAAATAGACCGGCAGAACTTCCTAAATGCATTA
ATGTTTCACTTAGAATTAATAATTTTGATTCGCAAAGGGGAGAGATCGTGGTTACTTCACGTACTTTTTATGGAGTCCCTTCAGTACTCCCTACTCCGGTGATGCAGTCTATCCAAGTCATGCGCGATGCCCTATGCCGAGTTCATCCTACCACTCATTTCGTCTAATCCCTTAAATATACACCGGTATTCTCTCTGGTC
ATCAGTGCTACAATACGTGGGGGGCTCGATTACCGGTCGCAGTACTATCGGGGCTGGGTTTTTGATTTACCTGTTCAAATCTAGCGTTTTGGCATTGAGCCATCCCCGGCGATCTTACGAAAACCTGGGTCCCTCGGCCACTCGTCCCGTGGGCACGATCAATAACGATGAAAAACTTCAATCGACATTACTAAAAGGTG
ATATACTAGACTGGGCCCTTGTACCAAGTGGCATGTGGATTAGCGGAGTAATTCGTCAAGTTGAAGTATCCAGTTACCGTTAATTGCCCTCATACGAACTCCTAGTCACCTCCGAGGTAGGGGGTCCAAATTTCCAGCTGCAAACGCGTCCGTCGCGGTGATAGTCTTCTACACCTGGTATTTACTCATAAGAGCCACTA
TCCGCGGTACTGGCCTAGCCGGCCAGTTACGCCTCTCGCCGTCACACGTCAAGCTGTTATAGACCAGAACAGCAGCTTTCGAATAACTAGTAGATCAACACGGGTCATCACAATTACACTGCCAAGTGAGAGCACTCTGAACATATGCCTTCTGTTGCAGATAAGCCGTTATCGTTGAAAACCTTCTGTGGGCTTTGGAG
TTGCAGACTCAAGGACTACCCGCTCATCCAGGCTCGGCCGAGTTCGGACCCCGCTATGTGAGTTCTGGAGTTAAACAAAGGATTGAAGGGTGTCCTTGTTAATAGCAAAGCACCGATTTAGTAAAAGCCATCTTCGCGGACGATCAGACGAGCTAGCGCTATTGTTCGAGAGACCACCAGATCCGCTGAAGTATCAAATC
TGCAGTACACCAAAGAGTACCAGCTACGATGAGGGTCTGCCCAAAAGATCGATGTACACATCAATGCCCAACGGTGTTTCCTTCTGGCTTATACATAGGTATAAATATAGGTCATAGTCAAGTGCAGATGAACACCTGTGTCAAATGGTGAGTTAACGGAGCTTCTACTTGCTAGGCGCCATTTCCAAGAAGCATCGGTG
TTGACTGAGTTGCGCGCTTTCTTCCATGTTTTCTTTGTTACTTCGTCCTCCGTCGTTCACACTCCCAGATGAAACTACCCTATTCAAACTTATATAGGAATCCGGCAGATGTGTAAAAGTGGTCCAAGGTATCTGTGGAACTTGTACTGCCGGGTTTAGTGAAGACTTCGAATCCAGGCAATTAGATACCGTGCGGAATT
ACAAGCAGAAATTCCGTGGAGCCGCAATAATGCTTCTCAACGACTACTTGCCTAACCGCGATGATCAGTTTATCATAATCAACTCGATGTATCCGTTAACATAGGAGGATTCGATATTTGAACGCGGAAGTCTTATCATGCTCACCACCTCGTCCCGGCAGTGTCCTTGAGAGGGGGCCGTCCTCCTACCTTCTCTAGTA
AATAACCAGTAAAGACGTCGTCGGCAGGTAGGGCCCCAGGGCACTCGCTCTCGTGAGTCCACTAGGTTCCGTGAGAAGGTTAGGTTCTGCAAAGCCCTAACAACCCCCGGAATACTTTATGGGCTTGGTTCATAGACGAATTGGACGCCCGCAGGCCTTGCGGGCCTACGCTGCTGTTTTGGGGTCCGAATGCGAGAGTA
TCTGGGGAGACGCCCTATAATCCAAGCGTATCAATGTTGCTACTGGGTAGCCATAAAACGTTGGTAGACGCAAACTAGTTCAACTTACAGTTCATAGAAAATACTGAGACGCTTGTTTTGTTCAGCAGTTGCGCGAATTAGACGCAATCATTCGTTCAAGGTACGGTCCAGGACGCTGCGGGCGAGGTGGAACTAGTGTC
GTAGCACCAGGAATATAAATAATATAGTTTAGCTACCCACCTTAGGTACCAACATCCCTTCCTCTGCGACTCGTTCGCACTCTTTCTAAGACGAAAATCCATAATGGCCAGGGAACATACATGATGACCAGCCTCATTTACGTAGGTCGCTTGGGGAGAAATGAGGGTGGTGCCCGCTCTCTCCGAGTAGCCTCGTTCTC
ATATCATGAAACCTAGCGCTAAGCGGACGAAAAGAAGTTGCCCATCCGACTCGGAATAGGCTCATCACTGCTTCCTAGGCTCCAGCCGACTGACGATTCCCTCAGGAAACCTCCCTCGCCCTTGAGGTAGGATCGTAGACTTTACCCTCACGAAACAGTCGCTCATATAGCATCCATCGGCGCGGCCCTGCAGAGGTACT
GCTGGGGCAGCTCTCGTGAACAGTTATTGAAGCGCGGATGTAAACAGCGATGCTGAGCAAAATACTAACAGGTAGGCCGAGCCTGATCCTGTTAACTTTCGATCACATTCGGGCAACACGTACGTAGCCCTCCATGGCAAAGTCAGTTTTGCAAGCACGCCATTCACTCTGATTCGATTATTACAGTGGCAAGACGAGGC
CGTACGAGGTGAAGGTATAGCCTTTTGTCAGCCCCCTTAACATAGTGGCCTAATGCATGCAAACTTGCAGGCCCGCACCCAAAGAGATCAAAAGTTACAACCCAAAATTGAGTTGGTTATGGATCAGGGCCATAAGGTAGCTATAGGTTACCAACAGCAGGACCTGGGTAAGTCTGCAGCATTTTAACTGATAAGACTAA
AGCCGACTTGCGTTCCCATAGTAGTCTATTATTAATTCCGATAATGGTTGACGAAATGCTGCGCCGGGTGGATTAGCGGAACGCACTCTCAAGCGTTAAGCCTTGTGGCGGTGCTCACGGCACATATTATGCCTGCCATTAAACGTCCTGTAGGCAGAGGATGGGAGATTACACGGATGAGGTGAACATCGGGAGTATGA
TAAGTCCCAGCTTCCCTCTCCTTGTAAAATCAGGGAGCCTCCTTACGCTCCCGTTCTCTTTTTAAAGGATGAATCCCTTGTTTTCGTTGTTTTGTCAGAAACACTCGCGAAGCTGGAATTTCAGGGTAATTCTCGCGCCATACTCAACCCACGGATGCCTTCGCTATCGGAAAGCGTTTCTACTACGCCACGTACGGCCG
GCTAATTTCCCACTATCGTCCCCGCCCCTTGGATGGCTGATCAACTTAGCTCGTCCGGGCAGTGTAATTTTCACGCTGCCAAGGCACACTTATAGTTCTTAAAACCCGGAGAGTGCTACAAACTCCGCTGACACGGCTTTTTACTATAACCTAAGACCGAGGTTCCGACAAGTTCCGTTTCCGGCACCAAGCCCTTCAAA
TGAGTGCAAAGGGATTACGATATGTTAGTGGACTTGGTCACTCACTGTCCATGGCCCAATATACTAATTCAATAAGAAGACGTCACGGCATAAGCGACGCATCTCGTTCCACTCGCGGCATGCCAACACAAGTATCATGAACTCATCAGGCAAAGTAGAGATCTAAACTCGGTGATAGAGAGAAGTCATAATTCCGCAAT
GAGCGTGGAATTGCGCAGGAGGGTTATCTATGTTATCCGGACACCACAGAATTCGGCGCCCTCGCAGACCTTTTCATATTGCTCCCTTCCCTAGATACCTACCCCTCCTACCTCTAGACTATGAAGTGGCGACACCTCAAACGGGTTACTCGGGAACCGCGCCAGCCCCTTGTTGCCCTGTAGGCTAGCTCCGATCATAT
GCGTCAGGACAGACGGCGGGGACACCACAGACGTGCACTAGACGTGAGGCGCGGCCTCCCCAATAATTCTGAGACTATAAGTAGCTCCCTTTTAGAATGGAAGCTGGCCTACCGTATAAATATCGAAGATCGAACGCTCGTACCGTGTGTAACCTACTACGTCTAAGCTGTGAGGACAACTAGTGATTAGCCCAGCGCGT
GTCTTACGGCTTCGACTCGTACCAATTGGCATTCACCCGTATCAGTAGACGCTAGAGCATGGCTTAGGTATGCAAGTGGTACAACAGTCCTTAATGTCGAGTGGGCCTTAAACTGCTCGCACTACATCGGGGGGTCTGTGTAGCGTACACACACGGTCATTGGCAGAGCAGATAAGCATTTGGTTGCCTGCCTGAATTGT
CAGAGGAGTGACGGAGCTGACCCGTATCGTTATAAAGTAGATTCAAACGACGCCTTTTAGAATCCACAAATTGGTAACCTTTGTTGCACCGAATTGAGAGCGCTATCGTCATCAGACTTCTTCTTAAGGATTTTAGCGAACCTGACCCGATGGGGTCCCCAGCGAGAGCAGCGGCGGCTTCGCTGCAGACCACCCACTTA
GGTCCCTGGGCAGCAGTTCGGGGTAGAGGCTCGGTGAGCGGACTTGGCCACCGTAGGTCAGGACTTGGGTCACTCTCACCCGCACACAGGGCTGATGCATCGTCGCTTCTGGCGTAAAGAATACTCGAAAAGGCACGTTACTCCCTGCTCTACTCACTCAAAAGGGATCTAGTGGAGTCGTGAGCAGCCGGGACGGGACA
TCCGGTCAGCCCGAGGGGTGGATGAGGGGTTAGATACGGATAACTACGTTGTGCGGAAAATTAGCGTGATCCCCAGCACCGTTAGTACGTATGTCCGCCTTTTGAAACCAATGTCCTACCAACTGAGCGCCCTGATGGCCATGCCAGCTCTAAGACCCGCAGTAGTTAGGTGCAAAATGAGTCTTCTGCCTACGTGGATG
TAAACCACGCTCATTCCTAGTATCCTTCAAGTACGTGTCAGTGAGACAAAAAACTACTATAAATGACCACGCGCGGCAAAATTCAGGCAGTCGGTAGTAGCCCCACACAATCGCTCCACCATATCACGCCTATCGGTAGGTAAACAGTCTAGCACATGTTATAGTTAGTTAACCTAATTTAAGATGGATACTAGTGGTGC
ACGTTGCTAAGGTACAAGGGGGTTACACACGAGCAAATCTGGATTGGGTCATAACAGTGGGTACTGCATGGAAATTGTACGCACCCCCAGCCAATGGAGGAGGCGCCGGATGAGTCGACGGGGGCGGGCTCATTTACTTCAATATCAATTGACCTCAGTTAGCCCCTTCCCTCCTACCCAGGCGTATCAGCAGGACCGTA
CGGGAAGTATATCACTTGCCTCACGAGTTGAAAAGGATATCTTCCCCCAACGCACTCGGTAAGCGGATGTATTAATCTATCTTCGCTTTTGGGACTTATCCGTAAAATAGCTCGAACGGAGCGTTTGACCTCGTACTCACCATATGCCTAACGTGAATTACAACCTACAGGGCACTTACAACAAGCCTGGCCGATCTCAT
GAGTCAGTCCGGGGAGTTCCATTCACGTTGACTACGCAGATGAAATCATAAGTCAAGCGTGAGTTGTTTCTCCCGCTCCATGCATGTACGTCCGGTGCGTCAACAACTAATGTAGTTCTGTTTTCTCGACGTGTATGTACGGTAATAATTTATAGAAGGGACAGGAGGGTGTTAGCGCCGGCGCGAAACTAGAACAAAAG
TCTGCGTATTTAGGACTTGAGCTCTCATCGCGTTCCGGCCTCATGATGAATATTCGGCCGGCCAACCCGATCGGGCTCTGATACAGCGGCCCGTGAGGTTTGGCTGGGTGAGGTGGCCTAATGATTGACGATGAGGTCATAGCCCTCTGTGGGAATGGTCTCCTCAGCAAACAGGATGCTGTAATCGGAGGACTAGGAGG
TACAACCGTGGCGTCGTATTCATGTTACATACAAGTGGGCCTTAACGCCAGGCTGTAGCACACTCGTTTGACGTGCGTTGCGGTCTAGGATGGCTGAGCTGTCGAGGACCAGAGTCGACGCGGCGTGACTTGATTGCACTCTCAACGATATTCCCAGGCCTCTGGGGCGAAAGCGCATCTGTTCAGGAAAGACACCATAG
CCGGGCGGTGAGCCTGCGCACCAAAGGCTTCGCCCGCGTGGGTGGACCCATCGCAGAGTCCCTGCGTTATAAGCAATGTGTGTAACGTTTCCCATAATTAAGTCAGTGACTGGGTTTACGACCAATGGAACCCTGAAGATAATACAACTGGTGCAAGCATATATACGGGATGTCACGCGAACCTCCTATCGAGTACGGTA
GCAGCCGCCAGAGAAGGATTACAGTGTCGGGTACCTGCGGAGTACATCTCAACTAGTAACTCCGGACATAGCAGTTCACCGAATATCATCCTGAAGTCGGCAGCTTCGAAGCCCTCCGGTGGCTGCAGTGGGGAGGTTCACTGGGGACACGCGACTGGGCAACTCGCATGTCACATGCATCCTCGGACCACGCTCTCCCG
GGCATCCGCTTCTGCGTCCAGAGTTTGTTCTGAAATTGACAGCCAAGGGCTCAGGAGAGTCCGCATACACCGTATTGGTACCGCGAAAGATTATGATGCCAGAAGACGTTCACCAAAAGAATGCTTACTGCTACGGACCACGGGATGAGGCGGGTGACACGTTTCTCGAGAACTGGAACATGCTGGGCCTACATAGTATC
GCCGATTAAGCAGTGAATGCCATCTAACTGTTATTGGCCGACCTTCCTTTGTAATAATTTCATACCAGCGACAAGGGTAGGCGGCCGTCTCTTCGCGCCCAGTAGGTCTAATTCGCGGGCCCAGCACAACAAGAGTAAACGTCGAAATAACACGTAACTGTCGGCACGTGCGCGGCGTGCAGTAGGAGAATCGCTCAGAT
GCCTGGGCAGGCGGATCGATGGCACGCACTCAGTTACACTAATAATTTATCTACCTAACTCGCCGGCATTCATTGTGCCCTTTTAGCGGTACACATGCGAAGGTATTACAAACACAGTACCACTTCGGGAACGGTGTACCTAAAAACGCGGGCCACGGCCCCTTGTATCATAAACTCCACTCTTTGTTTCAGGTCTCCCT
AACGATCTTTCCCTATGAGTCTTACAGCAGACCGGCCTGTCCGTTTAGACCGCATGATAATTTTACGAAAGCGGCGCCAGGAGCACAACTACCCACCGATCGGAGATTGAGTCTTACTGCATGCCGGGTTGCATCTCTGGCAGTCTACTATTGTCGCAGGTCCGTTTCCACGTAATTATATACCATATCGGTTAGAGCCA
CTGTGCTCCAACCCTTGAAGTCCTACTTCCGTGCGACCACTCACGTCTTGGTGAGTACATTGAAGCATAGCCATATCCGTTGGCGGGACGTCCCTCGTCACGAGCGGCGATCGTCGACTCACCTTGACCTCTTGTACTGCTCGCGATCGCACTCCTCGTGCCATTCCAACGGGGTCCTTACCTAGCTAACAAGTAGATTT
TCTATCCCACACGTACTGGTAGCATGTGAGCATAGTCTGATAATAAGAGTCGGGCACTGATTCAGGCCAAGAGGAATCATATTGGTAGGGGAGGTCATCATTTCCTTTCCTGCGCATAAGCCGGCGTACTACTTCCCTTTCCGGGTTCGTTATAGGCACAATAACAGAATGTTCAAAATTGTTGTCAGGCATTTGGATGT
TCTCATTTGCTGGTTGTACAAAAGACCACACAAACCAGTACGATAACACCACCCCCCGTAATCACGCCGCTCGGCGAAGTAGCATCTGTTACATCGGGTCAGGAGTCGATGCACTTCCTGGACGGTTATGATTGCGGATATGGCATGTGAGCTATAACTCTTACGTTCAACTTGGAGGTCTCGTACGCGTGAGTTGCGTG
TGACAGTTATCCTTGGCAACGATCTCTGTCTTGTAATTGGGTGTGGCCAAACTTACAATCACTTACTTACGAACATCCTAGCACGACGTCGGGAAGCATTGGTTGGGCTACGTTCAGGGTTGCCTGACATTTATAGTTGTTAAGACCTGTCGCTAAGTGGTATAGTTGGCGTCGTATCTGAGATCGAAATCGGGCACCGC
CACCAATGTCGACCATGACGGAGCACGTCCGCTGGACCCAACCTTGATTGCAAGCTTTACGCATGGACTCTCCAAGTCCAGAAACGTACAACACGACATCTGAGTCGCAATATATCTAGTAAATAAAATGTCACGCGTACAATATGTTGTCTCGCACGAACCAAGCGTAGCCAAGACCAAGTGCTATCGATTCAATCCTC
AATTATTCAATGATGATTCACACAAGCTCTAGCAGTGATCCAGACGGAGCAACACCGATTAAGTGGGACTAATGCCCGCGGCATTCGCCCACATCGGTAGTGGACTTTCGGGTGCCGGTTATGTACGCCCCTCCGACGTCAACCCGGCAACCACTCGTATTCATGGTGTGGCGATACAACCTTGGGTTTTTCCGATTTAT
GGTCCTCCGTGCGTACACATCTATCGACCGCCCGCTCTAAGTTAAACCACCGCTCGCCAACAGGTTGTAATTATATTCGGTATGGTGTCAGCAAAATAGGTATTACGATTCGCATGATTATCGTCATTGTGGGCCTCACCCAGAGCTCCTAAGTTCTGCAGCAGAATTCCTCCCCAATAGTTGTTCCACACCGGGGAGGT
GACACTTTTGTCGGAATGAATATGCTTGATGATGCCGGGTAGGTTTTAGAACCGACAACATGATTGAGTAAGAAAGTGGACGGCGGGGGCGCGACCCCAGGGCCGCAAAGCATGTGGCACAGCTAGCTAGCTTACCGATAAGTACTTTCCGGGGGCCCTACAATGTAGAAGAGAGGGAGCGAACCCCCTCAACACCCTCC
CGTGACAATGATGTAACCGTCACGGACCCCTACTCTGCACATCGTTGCTCGCGTCTAACTGTATTCGGAGAACAGTATTGACGGAGTGGACCACTGGGTAAAGTCTTGTAGCATGAATGGTCAAGCGCGAGGGAATGAGTGTCCAGCAACCAAGCACCAACTACAGACAGGATTGAGTAACGAGCATGAAAGGAAAAATG
TAGCGTTGAACTGGGCCCGTAACACCCTCTAGCGCAGGTAGTAAACGCGGGAAATATCTGACCAAGATGTCATCGACGGAAGACTTATGAAGTATTGATCGGCAGAGTGGGAATGAGAGAATACAGCACTGTAGGTAAACCAGCAGGTAATTTCCATGACTTGCGTTAACGAACCGCGTGTGAGTAACAAACTGCTACTA
AAGAAGGAATCTAAGGCTTTACTTTCGGTATTCATACGCGATCCCGTCCTCCGGGTTGGTAGGGCAGAGAATGCCGGGATCAAGTGGATAATTATGTTTCTTTGGTCTATACCTCCTACCTTGGCGAGCTATGTGCGCACATTTCCATTCATCAGGGTGATGTAGCCAGCAACCACCGCGTTAAATATTGTGATCCCTGC
AATCAGGCAATAGCATTCCAAAACACATCTCAGGACCAGTTCTCCTCGCGTGCGCAATCTAGTCCGCCCGCGCGGGTTGCGCAGCTCTTCAATGCTGAAAGGCAGTGATCAGACCGACGGAGAGCGAGGGTACTTGGGTGCAGATCGTTATGGATTCCTAGATTAGTCCCGCGCCTAGTGTGGAAGTCCTAGCGATCGAC
GTCCCTGCCCTCGGCGGCCCTGCCGTCCGACTATTTCTCACACAACTTCCAGATACCCGTATTCTCATCGAACGGTATATTTGAAAGTTTCTGTATGCAATCCATTAAATGAGCCCTAAGGGCAATGCCGCCAACTACATACCAGGACAACATTCTATTTTGACTTGTTTTGAATACTCTCGACGCCCCGTATGGAGGTT
GTTCTGTTAGTCACCAGTACCCTTTCCTCAGAGCCTCGAAGAAGTTTATGGGAGGCGTAACTAAATATGCTTCGATAAAGATCGTCAGTGACTTTCCTTACTATTATACGAATTTTGCATCCTAACATTACCGCGACACGACTAATACCGTCTCGTATCGCGAGCGCCAATACCATCTTAAGGAGGCTGTTCAAAAGGAC
CGAAAGGGCCTCAATCCTAACCCAGCATATAATAGGCTATCCATAAACAGAAAACTGTCCCCGACGCACTACAAGTTCGGTTATCAAGATAATGCTGCCACGAGGTCGATGGTCTTAAAGCTCGTTGGGCTCTGTTTGCCGCAGGCTCCTTTTACCAAAGTAATTGTCTAATCTGTCGGCGTACTTGAACGTATTATCTC
CCCTAACACATTCGTGGTAGAGCCTGAAATAATAACCATAATACGACTTTTACATATTATTAATTTGCCACCCGTCAATATTCCTCCTAGGCCGGAACGTTGAAACTCTAAATTAGCAACCCTGCTAAAGGACGTACATTGTAAATCCTCCACACCCCGTTAACAATTGTGACTAACCACCGCTTAGGAGACTTCACCCA
GTCGAAGGTCCAATGAGTAAGATCTGATATGTATAACTCGCATCTCAAGCCGCCAAGTTATCTGCTCGAAAGCGAAAGTCCGACGTAAAAATCAACGCACAATATTTTCTGATAATGTAGTCTATTGTCTCATCGATAGCGCAACATCTTCCACAACTGGGCTCATGAAACAACCATTTGCGCAGTAAATGAGCTAGACG
TAGCCAGCTTTTGCCCGCGCGGTCGGGCGAAGTGATATGGGTCATGTTTGGGCAACCCAGCGGGGTAAACGTGGACCAATGTTACTTATTATGACGCCCTGCTCAAAGGTACCCCCATCCACTGTTGGGTGCTTTGTTGGATTGGATTACCTACAGCTTATTATAGCTCTAGCTGAACGGACACGTAAAACACCTTGGTA
TAACACCGAAGGGGCTCATCGAACATCAGGGGGGAAACGCCATCTTCTGGATAATTGCGTATCGGTACACCGACGTCCCATCGCCATCCAGCCGCAGACCTCAATTGTGAGACGAGTGAGCATAATTTGTACGAGGCGCCTTCCGACTGGTGCTTGTACCGTCAACTTTTGAAAGTTACAGTTGCTTTAATCCCGCGTAT
AGCCCCCTCGGCGCTTCGAAGAAATAGGTGTGGCTGTCCTGTACACCTGGGTTTAACCGGGACGACGAATACTGCCGTTACGAGTTCGATATGGAAGTAGCATCTCAGGAAAGTAAAGACGTATCAGTATGGTGATCGTGCACCTACCACAATCCCTGACCCCGACTTTTAGCAGGGTAAGCGAGAAGCTACACAACGAA
GCACTACAAAAGTGCAACAGACCGCATCTAGCATGCGCAAGTTACTCTGCTCACGTATATTCGCCAGAAAAGGGTGTTGAGGATTGCCACTATTGATAGGCTGCGTTAGGAGAAGCGTTTTCCATGCCTGAGTCGCATGGCTGTCCTCCGCAGTTGAAAAACACCGTCATCAATCATCCTCGTGACGTAGCTGGGGGTTT
AAGAGTGAATGCTGCTGGCCCGTCAACGAATATTCCTTCCGTCTGGTTGCCGTTGCGTATAGTGGGGTCACGACTCCGAATATGATGAGTGAGCAACAGTCATATCGTCAAGTATCGCCCCACCTATGTTACCCAGCGGTATACCGACTGCGTTTTCGATATGTCATATAAATTATTCAGTGAGCTAACCCTCAATACGG
GGCCCGTTTGTATATACAACGTATGGCAACTCTACTGGCAATCGAATGGTTACACTATTAGTTCAGTACTGTCTTCCCGATGGTCATACAAGAACGTGCCTTCTAGCGGATGATTGACATTACGCTTACGGACTTTCTCTCCCGATCGCGGGCTAGTCGGGCCATGGCTTATTTGGGCGGATTTCTTTCCAACAGTACTA
GTGGTCTTTCCTAGTTCAAGGAGTACCAAACCGAAGGGCTGTCATACAGGAGATGTAATTTACTTTTACGAAAACCTCAGCACGAGCGATGACCCTCATTGACTTAATAACCTCCACTGAGGTGATGGTTCTGGGTCCTGACGTTTACCAAACAACTCCTGAACTTAAGATACTTGAACTGTTACTTAAAATTTGTCCTA
TCAATGTACCCGGGCAAACAACTATGTGAACAAGTATCCCCACCGCTGTCGCCACAAACGAGTGTACTGCTGGCAATTCCGCTGTCGTTATAATAGCTCGTGAGCCATAATCGCTCAGTGCTCCTTACAGTTAGTTTGCGTCTACTGGTTGGAGGGCTTCCGTTGCTATCATCTGCACTTACCAGGCGGTACATTGCTGC
TGTTTAGCGAGCCTCACCGCGTGCCTGCGAATGCCCTAGAGAACAACCCCGCATCTCGCCGGCCTGTTGCACGCACATCTCACTCCTGTCCAATCGGGGTAATCCTTGCGGGCTCAGAGCACCACTCTGGTAAGATTTAAGCCGTTACTGGAGGAGAGTTAACTTGCGTCTAGGGTTAACGCCCGGTCCGGTCATCCATA
GGGCCCTTGGCCAAAACGAAAGTGAGATAGGGAAAATCCAGCACTGTACTAGACTTCAACGCTTTGTAGACCAGTCTTAGTCGCCAAATTTACGGAGGAATTGACCAGGGTTAAATGTAGTTGTGTGGACCTAGCCCAGATGAGGGGTCGGAAGCGTAGTACCGCATCGCAATCTTTGGCGGTTCAGATACTCCGTAACA
GGTCGGCTGAGAAGTAGGCCTCGCGGTTGTACGGCTAGATCGGGTCGTAGCCCGCCACGCTCGTGCCTAGCGCTGCGATGGACCACAGTGAGCGTAATCAATCGAGGGTGAAACAAGCGGTCTTAATCCACAGATATTGCACCTCTCTGGAGACCTCTATGTGTTAGAACGTGTGTACTTAGGAAAACGAAAAAACATAA
CCTATGGTATTACACGCTAAATCGAGCAAGGGACACCGAAGTGAGACTGGGTACAGTCTTAGGGTAGAGGTAAGCAGATGGAATCCGCTTCCAGGCGCACACCGACCTCAGAGTCCGACGAAAAGCGTGGTTGGAATAGGTCGATGTGGGATCTACGATGGGGTAGGAACTGGACCGCCAAAAACGTGATGCACGTCGTA
TCACGGATTAGCTTATGATATGTGGCCACCAAGGTAGGATCATGATGCTGAGAAGGGAGGGAGCCGATAAAAATTCCCTGGGCCGATTAGGGCTAGCTCCTCGTGGCGTGTAAATATGTACATAGGCAAGCCCCCGGTATGGGCGAGGCTACGGGTTTAGTTTGGCGAAGCCTATTGTGACCGTTCCTATGATGCAGACC
GGGCGTTAGGGAGTTCGGTGGAAAGGGGGTTTAACACTGCTGCACAGGTGTGGCCGACCTCATGATGATATCGTATCCGCAACGATTAGGATCATGCTGCGAACGAGCCACAAAGGTTTTTAAAGTAAGTTGGAGTAGTGTGGTCTAATACCATACACGGGGGTCGTTCAAGCACCGGTGGGATACCGATTTCTAGATAG
TTTAAGAATTTCTCGGCGGATCGTGGCAACAGTGATACTGCGTCACAGCGATTAACACACATGACACTTACAGCGTCCAAATGTCACCCGGAGTTCGTAAACCTTGGAGAGCGGTTGTCTGAAGGGGTCAAAACGTCAAACCCAATGTTCCGTATGATAAGGACGGAGCGAGACCCAGGGATCCTGTCCTTCCAGAAATA
GCAGGTTAATATCTATATTTAGCATTCCCGATCCTATATCTGGACGGCAGCGTCGACTCATCTAGCCATATCCGTGTCATAGAGATTGCCTTGTTGTTCTCCTTGCTAGGGGAAAGTGTCGAACTTCACGGCCTGGATTACATCCGAAGTGTGGAGATAAATATCGAGTTCTGCTGACTCTCAAATGAAACAACTTAACT