RegExp reading only first unique occurance

biobrain · Jul 30, 2008

I have a data like this

SEQRES 1 A 1522 U U U G U U G G A G A G U
SEQRES 2 A 1522 U U G A U C C U G G C U C
SEQRES 3 B 1522 A G G G U G A A C G C U G
SEQRES 4 B 1522 G C G G C G U G C C U A A
SEQRES 5 B 1522 G A C A U G C A A G U C G
SEQRES 6 B 1522 U G C G G G C C G C G G G
SEQRES 7 C 1522 G U U U U A C U C C G U G
SEQRES 8 D 1522 G U C A G C G G C G G A C
SEQRES 9 F 1522 G G G U G A G U A A C G C
SEQRES 10 F 1522 G U G G G U G A C C U A C
SEQRES 11 F 1522 C C G G A A G A G G G G G

I am interested only to print , A, B, C, D, F from the third column only once irrespective of how many time these appear in each column.

Here is my code

Code:

I have written a code
open (SP, "myfile");
        my $flag=1;
        my $chainid;
        my $chainid2;
        my $chainidB;
	while(<SP>){
             if ($flag==1){
             	if ( $_=~/SEQRES\s+\S+\s(\S)/){
             	$flag++;
            	#print  $1 ;
                $chainid=$1;
                print "this is chain 1 $chainid\n";
             }
             }else{
             	if ( $_=~/SEQRES\s+\S+\s(\S)/){
		#print  $1 ;
                $chainid2=$1;
                print " this is chain 2 $chainid2\n";}
           }
           
           }

please help to get the desired results

stevexff · Jul 30, 2008

Can you give a sample of the expected output?

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

biobrain · Jul 30, 2008

The expected result should be like that

this is chain A

this is chain B

this is chain C

this is chain D

this is Chain F

but I am getting out put like this

this is chain A
this is chain A

this is chain B
this is chain B
this is chain B
this is chain B

this is chain C

this is chain D

this is Chain F
this is Chain F
this is Chain F

actually i do not want to have these repetitions in my results

travs69 · Jul 30, 2008

Code:

my %duplicate;
 while(<SP>){
  if ( $_=~/SEQRES\s+\S+\s(\S)/){
   next if exists $duplicate{$1};
   print "this is chain 1 $1\n";
   $duplicate{$1} = 1;
 }
}

not tested..

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]

Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;

stevexff · Jul 30, 2008

Perl:

use strict;
use warnings;

my %uniq;

while (<DATA>) {
   my $id = (split)[2];
   print unless exists $uniq{$id};
   $uniq{$id}++;
}


__DATA__
SEQRES   1 A 1522    U   U   U   G   U   U   G   G   A   G   A   G   U          
SEQRES   2 A 1522    U   U   G   A   U   C   C   U   G   G   C   U   C          
SEQRES   3 B 1522    A   G   G   G   U   G   A   A   C   G   C   U   G          
SEQRES   4 B 1522    G   C   G   G   C   G   U   G   C   C   U   A   A          
SEQRES   5 B 1522    G   A   C   A   U   G   C   A   A   G   U   C   G          
SEQRES   6 B 1522    U   G   C   G   G   G   C   C   G   C   G   G   G          
SEQRES   7 C 1522    G   U   U   U   U   A   C   U   C   C   G   U   G          
SEQRES   8 D 1522    G   U   C   A   G   C   G   G   C   G   G   A   C          
SEQRES   9 F 1522    G   G   G   U   G   A   G   U   A   A   C   G   C          
SEQRES  10 F 1522    G   U   G   G   G   U   G   A   C   C   U   A   C          
SEQRES  11 F 1522    C   C   G   G   A   A   G   A   G   G   G   G   G

which is pretty much the same as Travis' but without the regex...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

KevinADC · Jul 30, 2008

split() is a regexp. If those are fixed-width fields unpack() will be the most efficient way to get the third column.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

stevexff · Jul 31, 2008

KevinADC said:
split() is a regexp

All right officer, I'll come quietly, you've got me bang to rights.

What I should have said was "without the unnecessarily complicated regex" [smile]

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object:erlDesignPatterns)[/small]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

RegExp reading only first unique occurance

biobrain

MIS

stevexff

Programmer

biobrain

MIS

travs69

MIS

stevexff

Programmer

KevinADC

Technical User

stevexff

Programmer

Similar threads

Part and Inventory Search

Sponsor