Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

RegExp reading only first unique occurance

Status
Not open for further replies.

biobrain

MIS
Jun 21, 2007
90
GB
I have a data like this

SEQRES 1 A 1522 U U U G U U G G A G A G U
SEQRES 2 A 1522 U U G A U C C U G G C U C
SEQRES 3 B 1522 A G G G U G A A C G C U G
SEQRES 4 B 1522 G C G G C G U G C C U A A
SEQRES 5 B 1522 G A C A U G C A A G U C G
SEQRES 6 B 1522 U G C G G G C C G C G G G
SEQRES 7 C 1522 G U U U U A C U C C G U G
SEQRES 8 D 1522 G U C A G C G G C G G A C
SEQRES 9 F 1522 G G G U G A G U A A C G C
SEQRES 10 F 1522 G U G G G U G A C C U A C
SEQRES 11 F 1522 C C G G A A G A G G G G G

I am interested only to print , A, B, C, D, F from the third column only once irrespective of how many time these appear in each column.

Here is my code

Code:
I have written a code
open (SP, "myfile");
        my $flag=1;
        my $chainid;
        my $chainid2;
        my $chainidB;
	while(<SP>){
             if ($flag==1){
             	if ( $_=~/SEQRES\s+\S+\s(\S)/){
             	$flag++;
            	#print  $1 ;
                $chainid=$1;
                print "this is chain 1 $chainid\n";
             }
             }else{
             	if ( $_=~/SEQRES\s+\S+\s(\S)/){
		#print  $1 ;
                $chainid2=$1;
                print " this is chain 2 $chainid2\n";}
           }
           
           }

please help to get the desired results
 
Can you give a sample of the expected output?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
The expected result should be like that

this is chain A

this is chain B

this is chain C

this is chain D

this is Chain F

but I am getting out put like this

this is chain A
this is chain A

this is chain B
this is chain B
this is chain B
this is chain B

this is chain C

this is chain D

this is Chain F
this is Chain F
this is Chain F

actually i do not want to have these repetitions in my results
 
Code:
my %duplicate;
 while(<SP>){
  if ( $_=~/SEQRES\s+\S+\s(\S)/){
   next if exists $duplicate{$1};
   print "this is chain 1 $1\n";
   $duplicate{$1} = 1;
 }
}

not tested..

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Perl:
use strict;
use warnings;

my %uniq;

while (<DATA>) {
   my $id = (split)[2];
   print unless exists $uniq{$id};
   $uniq{$id}++;
}


__DATA__
SEQRES   1 A 1522    U   U   U   G   U   U   G   G   A   G   A   G   U          
SEQRES   2 A 1522    U   U   G   A   U   C   C   U   G   G   C   U   C          
SEQRES   3 B 1522    A   G   G   G   U   G   A   A   C   G   C   U   G          
SEQRES   4 B 1522    G   C   G   G   C   G   U   G   C   C   U   A   A          
SEQRES   5 B 1522    G   A   C   A   U   G   C   A   A   G   U   C   G          
SEQRES   6 B 1522    U   G   C   G   G   G   C   C   G   C   G   G   G          
SEQRES   7 C 1522    G   U   U   U   U   A   C   U   C   C   G   U   G          
SEQRES   8 D 1522    G   U   C   A   G   C   G   G   C   G   G   A   C          
SEQRES   9 F 1522    G   G   G   U   G   A   G   U   A   A   C   G   C          
SEQRES  10 F 1522    G   U   G   G   G   U   G   A   C   C   U   A   C          
SEQRES  11 F 1522    C   C   G   G   A   A   G   A   G   G   G   G   G
which is pretty much the same as Travis' but without the regex...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
split() is a regexp. If those are fixed-width fields unpack() will be the most efficient way to get the third column.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
KevinADC said:
split() is a regexp
All right officer, I'll come quietly, you've got me bang to rights.


What I should have said was "without the unnecessarily complicated regex" [smile]

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top