Calculating where a string start,where ends and offset based on regex? 1

dmazzini · Jun 24, 2008

Hi guys

I have many kinds of different reports where the data is located at the same position as the headers.
For example, as you can see here WBTS-1695 is just under header NE-ID, L6070407474 is just under "Target Id" column and so on.

Code:

               NE-ID      Target Id In Topology                             Feature Name            
          ------------------------------------------------------------------------------
           WBTS-1695    L6070407474         Yes                                IMA (FTM)                                             
           WBTS-1695    L6070407474          No                 Antenna Line Supervision                                 
           WBTS-1695    L6070407474         Yes                     BTS channel capacity

I want to make a generic sub routine to get data based on "column positions", using regex on the headers as reference and substr function.
Following code works fine for example with "HEADERFIVE", but not with "HEADER ONE"

Code:

#!/usr/bin/perl

my $headers=  qq(HEADER ONE           HEADERTWO   HEADER3  HEADER FOUR HEADERFIVE);
my $data=     qq(     data1     belong to data2     data3    my data 4   datafive);


if ($headers=~ /(HEADER ONE)/){

    print "Headers:$headers\n";
    my $length_headers=length($headers);
    print "Length_Headers =>$length_headers\n";    

    print "Matched:$1\n";
    my $length_matched=length($1);
    print "Length_Matched =>$length_matched\n"; 
      
    print "Before_Matched =>$`\n";
    my $length_before_matched=length($`);
    print "Length_Before_Matched =>$length_before_matched\n";     
    
    print "After_Matched =>$'\n";
    my $length_after_matched=length($');
    print "Length_After_Matched =>$length_after_matched\n";    
    
    $VALUETOGET =  substr( $data,$length_before_matched,$length_matched);   
    print "VALUE:$VALUETOGET\n"; # this gives me: " to data2" instead of "belong to data2"
    
}

I believe that still I have to calculate the spaces between the headers. For example between headers "NE-ID" and "Target Id".
Suggestions are welcome, and as I said before there are many reports with different info, but always data and headers are aligned
from left to right.

dmazzini
GSM/UMTS System and Telecomm Consultant

KevinADC · Jun 24, 2008

look into the pos() function.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

KevinADC · Jun 24, 2008

You could probably use index() as well.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

ishnid · Jun 24, 2008

Do you know in advance what the headers actually are? If not, it'll be trickier: e.g. how do you determine that the space between 'Id' and 'In' is a boundary between headers and the one between 'Target' and 'Id' is not?

I'm assuming you don't know the lengths of the fields beforehand so you have to work them out.

Here's something that might work for you if you have the headers beforehand:

Code:

#!/usr/bin/perl -w
use strict;

my @field_lengths;
my @headers = ( 'NE-ID', 'Target Id', 'In Topology', 'Feature Name' );

my $header_line = <DATA>;

for ( @headers ) {
   if ( $header_line =~ /(\s+\Q$_\E)/ ) {
      push @field_lengths, $+[0] - $-[0];
   }
   else {
      die "Header $_ not found\n";
   }
}

<DATA>; # skip the ----- line
my $pack_string = join '', map "A$_", @field_lengths;

while(<DATA>) {
   my @fields = unpack $pack_string, $_;
   print join '|', @fields, "\n";
}

__DATA__
    NE-ID      Target Id In Topology                             Feature Name
------------------------------------------------------------------------------
WBTS-1695    L6070407474         Yes                                IMA (FTM)
WBTS-1695    L6070407474          No                 Antenna Line Supervision
WBTS-1695    L6070407474         Yes                     BTS channel capacity

dmazzini · Jun 24, 2008

Hi Ishnid

I know in advance the headers name, but I did not want to "count" header position and offset for every single parameter. I did in the pass using substr and I was not happy with this solution..

I have tested your script and it seemd to work well! Thanks for that. Now I have to analize what you did .hhehe!

What does it do?

$+[0] - $-[0];
my $pack_string = join '', map "A$_", @field_lengths;

Cheers and many thaks again

dmazzini
GSM/UMTS System and Telecomm Consultant

ishnid · Jun 24, 2008

For each capturing group in a regular expression, there'll be an entry in the special @- and @+ arrays. $-[0] and $+[0] contain data for the first capturing group. $+[0] is the offset of the end of the match, and $-[0] is the offset of the start of it. By subtracting one from the other, I get the length of the field in question.

The $pack_string is the first argument to pass to the 'unpack' function. (If you don't know how that works, have a look at perlpacktut). Basically, @field_lengths holds the length of each field (in this case: 9, 15, 12, 41). I convert that into the string 'A9A15A12A41' so as to pass it to 'unpack'.

dmazzini · Jun 24, 2008

I 've got it! Thanks so much!

dmazzini
GSM/UMTS System and Telecomm Consultant

KevinADC · Jun 24, 2008

Good solution, don't see the @- and @+ varaibles used much. I would think the same could be accomplished using index().

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Calculating where a string start,where ends and offset based on regex? 1

dmazzini

Programmer

KevinADC

Technical User

KevinADC

Technical User

ishnid

Programmer

dmazzini

Programmer

ishnid

Programmer

dmazzini

Programmer

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor