Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Issue with program efficiency 1

Status
Not open for further replies.

3inen

Technical User
May 26, 2005
51
US
Hi! i have a program that is working fine except that i have huge data sets to work with and the script that i wrote is slow. is there are way to make it work faster.
thanks

while (<FILE1>) {

my ($a,$s,$p,$c) = $_ =~ /^(.+)\t(\d+)\t(\d+)\t(.+)/;
$n=$a."\t".$s."\t".$c;

$hash1{$n} = $p;

}

close(FILE1);




while (<FILE2>) {

@raw_data=<FILE2>;
}
close(FILE2);

foreach $n (keys %hash1){
foreach $raw (@raw_data){
my ($a, $s, $c) = split(/\t/, $n);
my @parts = split(/\t/,$raw);
if ($a eq $parts[0] && $parts[2] >= $s){

if ($parts[3] <=$hash1{$n}){
open(FILE4, ">>sorted33_mRNA/$c.txt") or die "$!";
$string = join("\t",@parts);
$string =~ s/\n//;
print FILE3 "$c\t$s\t$hash1{$n}\t$string\n";
print FILE4 "$c\t$s\t$hash1{$n}\t$string\n";

}
}




}

}

Is there a way to use 'if exists', when we are comparing values in the first file that are greater than and less than the values in the second file.
 
A first dégrossissage:
Code:
while(<FILE1>){
  my($a,$s,$p,$c)=split;
  $n=join"\t",$a,$s,$c;
  $hash1{$n}=$p;
}
close(FILE1); 
while (<FILE2>) {
  @raw_data=<FILE2>;
}
close(FILE2);
for$n(keys%hash1){
  my($a,$s,$c)=split(/\t/,$n);
  open(FILE4, ">>sorted33_mRNA/$c.txt") or die "$!";
  for(@raw_data){
    my@parts=split;
    if($a eq $parts[0]&&$parts[2]>=$s){
      if($parts[3]<=$hash1{$n}){
        print FILE4 join("\t",$c,$s,$hash1{$n},$_),"\n";
      }
    }
  }
  close FILE4;
}
Depending on what you have in the two files and the number of keys in %hash1 and what exactly you want to do with your data it could be possible to avoid such repetitive reads of FILE2's content: this would improve the speed very much.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
3inen

I appreciate that your script does what you want, but it would be good to know what you are trying to achieve. As prex1 has noted, there may be more efficient ways of doing this, but we need to know what it is you want to do.

For example, you split the content of file 1, join parts of it back together again to form the hash key, and then every time you go through the hash keys you split it out again. And I can't see why you need to join @parts into $string and use a regex to remove the \n when you could simply chomp($raw), or better still chomp it when you read it in.

You really ought to consider choosing meaningful names for your variables - in particular don't use $a or $b as these are 'special' to perl and are used in sort routines. Using meaningless names like $s, %hash, or $string just makes life hard for yourself - when you come back to look at this program in six month's time, will you really remember what $a stands for? Wouldn't it be easier if you used names like %mRNA or $match_candidate? You can tell if it's a string, hash, or array by the $, %, or @ on the front, you don't need to spell it out in the name.

Please don't think I'm giving you a hard time unecessarily - I appreciate that you are in the business of mRNA analysis, and programming is just a tool to help you in this. But by making a few simple changes to your code, you can make things easier for yourself now and in the future [smile]

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 

Hi! prex1, it much faster now. i can live with that. thanks so much.

Steve, point well taken. I need to put more effort into programming.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top