Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

rewriting a perl file 1

Status
Not open for further replies.

biobrain

MIS
Jun 21, 2007
90
0
0
GB
Dear All

I have a code written in PERL. It is a working code for me.
But i want to rewrite it to make it a bit short and more scientific.

I have a similar task which is repeated again and again like in if else you can see the matching statements are also most similar except the number at the end of regexp. i.e 10, 11, 12, 13, 14, 18, 31 etc.

I also want to make it a bit more flexible so that it reads these numbers can be changed any time by a user input either by a text file or through command prompt and do not remains fixed as there are in this code.


Code:
use strict;

#open the directory and than read all the files with *.txt
use Cwd;
my $dir = cwd;
opendir(DIR,"$dir") or die "$!";

my @all_pdb_files  = grep {/\.txt$/} readdir DIR;
close DIR;

#reterive desired data from the txt file 
foreach (@all_txt_files){

open (SP, "$_");
my $test;
my $test2;
my $test3;
my $test4;
my $test5;
my $test6;
my $test7;
my $test8;
my $test9;
my $test10;
my $test11;
my $test12;
my $test13;
my $test14;
my $test15;
my $test16;
my $test17;
my $test18;
my $test19;
my $test20;
my $test21;
my $test22;
my $test23;
my $test24;
my $test25;
my $test26;
my $test27;
my $test28;
my $test29;
my $test30;


while (<SP>){
   #match txt Id in the file header
   if ($_=~/^HEADER[\s\S]+(....)..............$/)
   {
   $test=$1;
   #chomp $test;
   #print "$test";
   $test= "$test" . ".newtxt";
   open (OUTPUT,">result/$test");   
   }
   # open new out put files

  if (/^HEADER[\s\S]+/) 
   {
   $test21=$_;
   print(OUTPUT "$test21");
   }
  if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+10\s/) 
   {
   $test2=$_;
   print(OUTPUT "$test2");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+11\s/) 
   {
   $test3=$_;
   print(OUTPUT "$test3");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+12\s/) 
   {
   $test4=$_;
   print(OUTPUT "$test4");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+13\s/) 
   {
   $test5=$_;
   print(OUTPUT "$test5");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+14\s/) 
   {
   $test6=$_;
   print(OUTPUT "$test6");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+18\s/) 
   {
   $test7=$_;
   print(OUTPUT "$test7");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+31\s/) 
   {
   $test8=$_;
   print(OUTPUT "$test8");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+33\s/) 
   {
   $test9=$_;
   print(OUTPUT "$test9");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+64\s/) 
   {
   $test10=$_;
   print(OUTPUT "$test10");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+80\s/) 
   {
   $test11=$_;
   print(OUTPUT "$test11");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+81\s/) 
   {
   $test12=$_;
   print(OUTPUT "$test12");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+82\s/) 
   {
   $test13=$_;
   print(OUTPUT "$test13");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+83\s/) 
   {
   $test14=$_;
   print(OUTPUT "$test14");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+84\s/) 
   {
   $test15=$_;
   print(OUTPUT "$test15");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+85\s/) 
   {
   $test16=$_;
   print(OUTPUT "$test16");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+86\s/) 
   {
   $test17=$_;
   print(OUTPUT "$test17");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+89\s/) 
   {
   $test18=$_;
   print(OUTPUT "$test18");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+129\s/) 
   {
   $test19=$_;
   print(OUTPUT "$test19");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+131\s/) 
   {
   $test20=$_;
   print(OUTPUT "$test20");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+132\s/) 
   {
   $test22=$_;
   print(OUTPUT "$test22");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+134\s/) 
   {
   $test23=$_;
   print(OUTPUT "$test23");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+144\s/) 
   {
   $test25=$_;
   print(OUTPUT "$test25");
   }
if (/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+145\s/) 
   {
   $test26=$_;
   print(OUTPUT "$test26");
   }
}

}
 
Why do you need seperate variables ($test,$test2,$test3 etc..)? Why can't you use a single variable out there?

Also you can combine all the if conditions of REGEX into single one by using OR operator and make a single statement.

--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
Any time you find yourself writing scalar names like $test1, $test2 etc that should immediately start the alarm bells ringing in your head to tell you that you need to be using an array instead.
 
Yes,

But I am not getting on how to do? what to do now?

I have things in my mind but I am stuck
 
If this is a fixed length record file, then substr is going to be much faster than firing up the regex engine for each line.

Also, using Switch.pm would make your code a little more sensible.

Can we see 4 or 5 lines of test data please?
Code:
open FH, "<file.txt";
my $index=??;
my $length=??;
use Switch;
while (<FH>) {
   $test_value=substr($_, $index, $length)
   switch ($test_value) {
        case 1          { print "number 1" }
        case 1          { print "number 1" }
        case "a"        { print "string a" }
        case [1..10,42] { print "number in list" }
        case (@array)   { print "number in list" }
        case /\w+/      { print "pattern" }
        case qr/\w+/    { print "pattern" }
        case (%hash)    { print "entry in hash" }
        case (\%hash)   { print "entry in hash" }
        case (\&sub)    { print "arg to subroutine" }
        else            { print "previous case not true" }
    }
}
Regards

Paul
------------------------------------
Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
something like this:

Code:
[gray]#!/usr/bin/perl[/gray]
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]
[black][b]use[/b][/black] [green]warnings[/green][red];[/red]

[gray][i]#open the directory and than read all the files with *.txt[/i][/gray]
[black][b]use[/b][/black] [green]Cwd[/green][red];[/red]
[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$dir[/blue] = cwd[red];[/red]
[url=http://perldoc.perl.org/functions/opendir.html][black][b]opendir[/b][/black][/url][red]([/red]DIR,[red]"[/red][purple][blue]$dir[/blue][/purple][red]"[/red][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]
[black][b]my[/b][/black] [blue]@all_txt_files[/blue]  = [url=http://perldoc.perl.org/functions/grep.html][black][b]grep[/b][/black][/url] [red]{[/red][red]/[/red][purple][purple][b]\.[/b][/purple]txt$[/purple][red]/[/red][red]}[/red] [url=http://perldoc.perl.org/functions/readdir.html][black][b]readdir[/b][/black][/url] DIR[red];[/red]
[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] DIR[red];[/red]

[gray][i]#reterive desired data from the txt file[/i][/gray]
[olive][b]foreach[/b][/olive] [black][b]my[/b][/black] [blue]$file[/blue] [red]([/red][blue]@all_txt_files[/blue][red])[/red][red]{[/red]
   [url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url] [red]([/red]SP, [blue]$file[/blue][red])[/red] or [black][b]die[/b][/black] [red]"[/red][purple]Unable to open [blue]$file[/blue]: [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
   [olive][b]while[/b][/olive] [red]([/red]<SP>[red])[/red][red]{[/red]
      [gray][i]#match txt Id in the file header[/i][/gray]
      [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^HEADER[[purple][b]\s[/b][/purple][purple][b]\S[/b][/purple]]+(....)..............$[/purple][red]/[/red][red])[/red][red]{[/red]
         [black][b]open[/b][/black] [red]([/red]OUTPUT,[red]"[/red][purple]>result/[blue]$1[/blue].newtxt[/purple][red]"[/red][red])[/red] or [black][b]die[/b][/black] [red]"[/red][purple][blue]$![/blue][/purple][red]"[/red][red];[/red]   
         [olive][b]while[/b][/olive] [red]([/red]<SP>[red])[/red] [red]{[/red]
            [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^HEADER[[purple][b]\s[/b][/purple][purple][b]\S[/b][/purple]]+[/purple][red]/[/red][red])[/red][red]{[/red]
               [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] OUTPUT[red];[/red]
            [red]}[/red]
            [olive][b]elsif[/b][/olive] [red]([/red][red]/[/red][purple]^ATOM[purple][b]\s[/b][/purple]+[purple][b]\S[/b][/purple]+[purple][b]\s[/b][/purple]+[purple][b]\S[/b][/purple]+[purple][b]\s[/b][/purple]+[purple][b]\S[/b][/purple]+[purple][b]\s[/b][/purple]+[purple][b]\S[/b][/purple][purple][b]\s[/b][/purple]+(10|11|12|13|14|18|31|33|64|80|81|82|83|84|85|86|89|129|131|132|134|144|145)[purple][b]\s[/b][/purple][/purple][red]/[/red][red])[/red][red]{[/red]
               [black][b]print[/b][/black] OUTPUT[red];[/red]
            [red]}[/red]
         [red]}[/red]
      [red]}[/red]
   [red]}[/red]
[red]}[/red]
[tt]------------------------------------------------------------
Pragmas (perl 5.8.8) used :
[ul]
[li]strict - Perl pragma to restrict unsafe constructs[/li]
[li]warnings - Perl pragma to control optional warnings[/li]
[/ul]
Core (perl 5.8.8) Modules used :
[ul]
[li]Cwd - get pathname of current working directory[/li]
[/ul]
[/tt]

This regexp:

Code:
/^HEADER[\s\S]+(....)..............$/

could maybe be written better as:

Code:
/^HEADER\s*(....)..............$/

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks,

Code:
(/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+(10|11|12|13|14|18|31|33|64|80|81|82|83|84|85|86|89|129|131|132|134|144|145)\s/)

can in the above line we pick values from 10,11,12, 13, 18 etc from an external file or from an array

i.e the external file could be a simple file containing these values
10
11
12
13
18
etc.

or alternatively can these be defined in the command line

Well I will give it a try myself today.

And will post the problem if any.

Regards
 
If I understand the question the answer is yes. For example:

perl scriptname.pl 10 11 12 13

in the script get the arguments and make them into a string:

Code:
my $pattern = join '|', @ARGV;

$pattern would now equal: 10|11|12|13

Then use $pattern in the regexp:

Code:
(/^ATOM\s+\S+\s+\S+\s+\S+\s+\S\s+($pattern)\s/)







------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top