Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

about how to handle two files simultaneously 2

Status
Not open for further replies.

Everwood

Technical User
Jul 18, 2005
78
US
Hi all,


I have two txt files to handle. One is "short_sequences" and the other
one is "long_sequences". The "short_sequences" holds
100 short sequences (8 nucleotide long) and 100 long sequences (200 nucleotide
long) in the "long_sequence".

For example, the first short sequence is "TTGACATA" and the first long sequence
is "GAATCATATATTAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".

Basically, we want to generate a random position as a starting site to replace a substring
in the long sequence with a short sequence. In this example, we can choose a starting site
as 5th nucleotide in the long sequence, after replacing using "TTGACATA", the replaced
long sequence is "GAATTTGACATAAGTCTCCACATACTCCGTTCGTGACCCATTACCCTTTCGGGAGA
GCCACAGCAACTGTAGATCTCGAAGTTGACAGGGGCAACTAGAGGCCTCAGAATTCT
CACTCTTGAGGAGAGAAGTCTAAGACCTACAGTATGGTCGGGTTAGTTTTTGTTCCGTC
GAACCTTGGACTAACCACTGTCTGGATA".

Then I want replace the 2nd long sequence with the 2nd short sequence and repeat this over and over again until the last long sequence is reached and replaced. I think
the only problem is that the starting site should not be larger than 193. Otherwise, there are
not enough nucleotides in the long sequence for replacement.

Furthurmore, I want to keep track the starting replacement site for each long sequence.


I am copying my code in the below.

use strict;
use warnings;

my (@short, @long, $offset); # the 'short' array will hold the short
#sequences while 'long' array the long sequences

open(FILE1, '<', "short_sequences.txt") || die "Can't open short_sequences.txt: $!\n";
while(<FILE1>){
chomp;
push(@short, $_);
}
close FILE1; #Close the file

open(FILE2, '<', "long_sequences.txt") || die "Can't open long_sequences.txt: $!\n";
while(<FILE2>){
chomp;
push(@long, $_);
}
close FILE2; #Close the file


# replacement
foreach my $short(@short){
foreach my $long(@long){
$offset = int(rand(length($long)%193));
substr($long,$offset,length($short),$short);
printf "%3d", $offset+1;
print "\n", $long, "\n";

}
}


But I just realized that there is a problem for the two
loops. Actually each short sequence will replace all long sequences not the corresponding one.

So I seek your suggestions on how to handle two files
simultaneously for my case.

Thank you very much and look forward to your reply!

Best Regards,
Alex

 
Hi Duncan,

I want to write the starting site of replacement and
the replaced sequences into a file instead of being printing
on the terminal screen. Can you give some suggestions?

Thanks,
 
The print statement:-

print OUT_NORM ">SeqName$x\n$long[$x]\n; should do it

This will all end up in output16_1.txt


Kind Regards
Duncan
 
Hi Duncan,

I don't think so. That statement
will lead to a ID name of a sequence
and followed by the replaced sequence.

I want to generate another file which
holds the starting site, say "r" and the
replaced sequence.

Thanks,
 
O.K. - think i understand. sorry if i am being a numpty!

Code:
[b]#!/usr/bin/perl[/b]

open (SHORT, "< short.txt");
chomp (@short = <SHORT>);
close SHORT;

open (LONG, "< long.txt");
chomp (@long = <LONG>);
close LONG;

open (OUT_HTML, "> output.html");
print OUT_HTML "<pre>";
open (OUT_NORM, "> output.txt");

for ($x=0; $x<=$#short; $x++) {
  $r=int(rand(length ($long[$x]) - length ($short[$x]) + 1));
  print "### $r ###\n";
  
  print "$long[$x]\n"; [red]this is the string in its unadulterated state ... soon to be altered[/red]
  
  # this section is for visual purposes only
  $output_norm = substr($long[$x], $r, length $short[$x]);
  print " " x $r;
  print "$output_norm\n";
  
  substr($long[$x], $r, length $short[$x]) = "<font color=red><b>$short[$x]</b></font>";
  print OUT_HTML "$long[$x]\n";
  
  $long[$x] =~ s/<[^>]+>//g;
  print OUT_NORM "$long[$x]\n";
}

close OUT_HTML;
close OUT_NORM;


Kind Regards
Duncan
 
I'm not really on the ball today... sorry!

Code:
[b]#!/usr/bin/perl[/b]

use strict;
use warnings;

my (@short, @long,$x,$r, $output_norm);

open (SHORT, "< short_sequences16_1.txt");
chomp (@short = <SHORT>);
close SHORT;

open (LONG, "< long_sequences.txt");
chomp (@long = <LONG>);
close LONG;

open (OUT_INITIAL,  "> output_1.txt");
open (OUT_REPLACED, "> output_2.txt");

for ($x=0; $x<=$#short; $x++) {
  $r=int(rand(length ($long[$x]) - length ($short[$x]) + 1));
  print OUT_INITIAL ">SeqName$x\n$long[$x]\n";
  print OUT_REPLACED "SeqName$x\n" . substr($long[$x], $r, length $short[$x]) . "\n";
}

close OUT_INITIAL;
close OUT_REPLACED;

... is this more like it?


Kind Regards
Duncan
 
I changed
print OUT_REPLACED "SeqName$x\n" . substr($long[$x], $r,length $short[$x]) . "\n";

to print OUT_REPLCAED "SeqName$x\n" . $r . "\n";

that is what I really want.

thanks
 
no problem... sorry i was being a bit of a muppet!


Kind Regards
Duncan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top