Hi all!
I use code (see below) to generate a bunch of sequences like:
************************************************
46
8
GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTGACATAGAGCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCTCACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTCGAACCTTGGACTAACCACTGTCTGGATA
79
8
AGGATTACCCGCTGGACTTCAAACGCTCGTGAAGCATCGTATTGCGAGGCAACCGAGTCATAGCCCAGTCCGGGGGCCATCGCCATCCCAGCATCTGCGTTGTTCATCGGTCCTCAGTCTCCCATCAACGTGGTCCACACCTAGCATCCTGGTTTTGCATCCGTAACAAAGGACGTTCGAAGTTTTTTGCCGGCGGGAAG
70
8
TGATAATTGGTGCAATATTCTCCATAACAGATCCTCGCCAATACGGATTTGAGGGATCCCTCTGCATTTCTTGACTTAGTGTCACCGATAGAGCAGAAATGCTTTACCGCCGCAGTGATTAGGCGGGTACAGTTGTCCAAACGCACACAACCGAAACCTCCCCATGCGTACTCGTTCGTTTAGTCGCGTACAGAGGGAAC
...................
**************************************************
Please ignore the digital numbers. I stored these data
in a txt file and transfer it between my laptop and a server
using Filezilla.
When I open the file on the server, there is a "$" sign in
the end of each sequence which resulted in a failure to be tested by a software.
Actually the requirement of the input file for the software is Fasta file. I am thinking if I can get a Fasta format file directly?
Thank you very much for your help!
Regards,
Alex
The code is :
*********************************
#!/usr/bin/perl
use strict;
use warnings;
my (@short, @long, $offset); # the 'short' array will hold the short
#sequences while 'long' array the long sequences
open(SHORT, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n";
while(<SHORT>){
chomp $_;
chop $_;
push(@short, $_);
}
close SHORT; #Close the file
open(LONG, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n";
while(<LONG>){
chomp $_;
chop $_;
push(@long, $_);
}
close LONG; #Close the file
# replacement
for(my $i = 0; $i <= $#short; $i++){
$offset = int(rand(193));
print $offset."\n";
#print length($short[$i]);
substr($long[$i],$offset,length($short[$i]),$short[$i]);
print "\n", $long[$i], "\n";
}
********************************
I use code (see below) to generate a bunch of sequences like:
************************************************
46
8
GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTGACATAGAGCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCTCACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTCGAACCTTGGACTAACCACTGTCTGGATA
79
8
AGGATTACCCGCTGGACTTCAAACGCTCGTGAAGCATCGTATTGCGAGGCAACCGAGTCATAGCCCAGTCCGGGGGCCATCGCCATCCCAGCATCTGCGTTGTTCATCGGTCCTCAGTCTCCCATCAACGTGGTCCACACCTAGCATCCTGGTTTTGCATCCGTAACAAAGGACGTTCGAAGTTTTTTGCCGGCGGGAAG
70
8
TGATAATTGGTGCAATATTCTCCATAACAGATCCTCGCCAATACGGATTTGAGGGATCCCTCTGCATTTCTTGACTTAGTGTCACCGATAGAGCAGAAATGCTTTACCGCCGCAGTGATTAGGCGGGTACAGTTGTCCAAACGCACACAACCGAAACCTCCCCATGCGTACTCGTTCGTTTAGTCGCGTACAGAGGGAAC
...................
**************************************************
Please ignore the digital numbers. I stored these data
in a txt file and transfer it between my laptop and a server
using Filezilla.
When I open the file on the server, there is a "$" sign in
the end of each sequence which resulted in a failure to be tested by a software.
Actually the requirement of the input file for the software is Fasta file. I am thinking if I can get a Fasta format file directly?
Thank you very much for your help!
Regards,
Alex
The code is :
*********************************
#!/usr/bin/perl
use strict;
use warnings;
my (@short, @long, $offset); # the 'short' array will hold the short
#sequences while 'long' array the long sequences
open(SHORT, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n";
while(<SHORT>){
chomp $_;
chop $_;
push(@short, $_);
}
close SHORT; #Close the file
open(LONG, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n";
while(<LONG>){
chomp $_;
chop $_;
push(@long, $_);
}
close LONG; #Close the file
# replacement
for(my $i = 0; $i <= $#short; $i++){
$offset = int(rand(193));
print $offset."\n";
#print length($short[$i]);
substr($long[$i],$offset,length($short[$i]),$short[$i]);
print "\n", $long[$i], "\n";
}
********************************