Issue with program efficiency 1

3inen · May 5, 2009

Hi! i have a program that is working fine except that i have huge data sets to work with and the script that i wrote is slow. is there are way to make it work faster.
thanks

while (<FILE1>) {

my ($a,$s,$p,$c) = $_ =~ /^(.+)\t(\d+)\t(\d+)\t(.+)/;
$n=$a."\t".$s."\t".$c;

$hash1{$n} = $p;

}

close(FILE1);

while (<FILE2>) {

@raw_data=<FILE2>;
}
close(FILE2);

foreach $n (keys %hash1){
foreach $raw (@raw_data){
my ($a, $s, $c) = split(/\t/, $n);
my @parts = split(/\t/,$raw);
if ($a eq $parts[0] && $parts[2] >= $s){

if ($parts[3] <=$hash1{$n}){
open(FILE4, ">>sorted33_mRNA/$c.txt") or die "$!";
$string = join("\t",@parts);
$string =~ s/\n//;
print FILE3 "$c\t$s\t$hash1{$n}\t$string\n";
print FILE4 "$c\t$s\t$hash1{$n}\t$string\n";

}
}

}

}

Is there a way to use 'if exists', when we are comparing values in the first file that are greater than and less than the values in the second file.

prex1 · May 6, 2009

A first dégrossissage:

Code:

while(<FILE1>){
  my($a,$s,$p,$c)=split;
  $n=join"\t",$a,$s,$c;
  $hash1{$n}=$p;
}
close(FILE1); 
while (<FILE2>) {
  @raw_data=<FILE2>;
}
close(FILE2);
for$n(keys%hash1){
  my($a,$s,$c)=split(/\t/,$n);
  open(FILE4, ">>sorted33_mRNA/$c.txt") or die "$!";
  for(@raw_data){
    my@parts=split;
    if($a eq $parts[0]&&$parts[2]>=$s){
      if($parts[3]<=$hash1{$n}){
        print FILE4 join("\t",$c,$s,$hash1{$n},$_),"\n";
      }
    }
  }
  close FILE4;
}

Depending on what you have in the two files and the number of keys in %hash1 and what exactly you want to do with your data it could be possible to avoid such repetitive reads of FILE2's content: this would improve the speed very much.

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

stevexff · May 6, 2009

3inen

I appreciate that your script does what you want, but it would be good to know what you are trying to achieve. As prex1 has noted, there may be more efficient ways of doing this, but we need to know what it is you want to do.

For example, you split the content of file 1, join parts of it back together again to form the hash key, and then every time you go through the hash keys you split it out again. And I can't see why you need to join @parts into $string and use a regex to remove the \n when you could simply chomp($raw), or better still chomp it when you read it in.

You really ought to consider choosing meaningful names for your variables - in particular don't use $a or $b as these are 'special' to perl and are used in sort routines. Using meaningless names like $s, %hash, or $string just makes life hard for yourself - when you come back to look at this program in six month's time, will you really remember what $a stands for? Wouldn't it be easier if you used names like %mRNA or $match_candidate? You can tell if it's a string, hash, or array by the $, %, or @ on the front, you don't need to spell it out in the name.

Please don't think I'm giving you a hard time unecessarily - I appreciate that you are in the business of mRNA analysis, and programming is just a tool to help you in this. But by making a few simple changes to your code, you can make things easier for yourself now and in the future [smile]

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

3inen · May 6, 2009

Hi! prex1, it much faster now. i can live with that. thanks so much.

Steve, point well taken. I need to put more effort into programming.

MikeLacey · May 7, 2009

No

Be lazier 3inen, remember that if you call your variables something understandable it will be the same amount of work now and (much)less work later..

Mike

http://www.myspace.com/micahhowzat

http://mikelacey.fuzz.com/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Issue with program efficiency 1

3inen

Technical User

prex1

Programmer

stevexff

Programmer

3inen

Technical User

MikeLacey

MIS

Similar threads

Part and Inventory Search

Sponsor