need regex help again.

StickyBit · Oct 6, 2004

Folks,

I'm trying to write an expression to pickup the following line in a data file:

0 0 0 360 0.2 12 9.1

The following expression works but I’m not sure why:

(/^\s([0-9]{1,}\s{5})/)

I thought this expression would work but it doesn’t

(/^\s([0-9]{1,}\s){4}[0-9]{1,}\.([0-9]{1,}\s){2}[0-9]{1,}\.[0-9]{1,}/)

Need help,

Stickybit.

toolkit · Oct 6, 2004

Its quite useful to use the 'x' modifier to explain complex regular expressions:

Code:

while (<DATA>) {
    if (/^         # start of line
        \s         # a single whitespace
        (          # start grouping
        [0-9]{1,}  # 1 or more digits
        \s{5}      # exactly 5 spaces
        )          # end grouping
        /x) {
        print "matched 1\n";
    }
    if (/^         # start of line
        \s         # a single whitespace
        (          # start grouping
        [0-9]{1,}  # 1 or more digits
        \s         # a single whitespace
        )          # end grouping
        {4}        # grouping occurs exactly 4 times
        [0-9]{1,}  # 1 or more digits
        \.         # a decimal point
        (          # start grouping
        [0-9]{1,}  # one or more digits
        \s         # a single whitespace
        )          # end grouping
        {2}        # grouping occurs exactly 2 times
        [0-9]{1,}  # 1 or more digits
        \.         # a decimal point
        [0-9]{1,}  # 1 or more digits
        /x) {
        print "matched 2\n";
    }
}
__DATA__
 0     0     0      360     0.2   12       9.1

This reports "matched 1" only. I think there's a few errors in your second regexp.
Cheers, Neil

toolkit · Oct 6, 2004

Here's a modified version of the regexp which should work:

Code:

#!/usr/bin/perl -w

while (<DATA>) {
    if (/^         # start of line
        \s         # a single whitespace
        (          # start grouping
        [0-9]+     # 1 or more digits
        \s+        # 1 or more whitespace
        )          # end grouping
        {4}        # grouping occurs exactly 4 times
        [0-9]+     # 1 or more digits
        \.         # a decimal point
        (          # start grouping
        [0-9]+     # 1 or more digits
        \s+        # 1 or more whitespace
        )          # end grouping
        {2}        # grouping occurs exactly 2 times
        [0-9]+     # 1 or more digits
        \.         # a decimal point
        [0-9]+     # 1 or more digits
        /x) {
        print "matched\n";
    }

    # same line as above without 'x' modifier
    if (/^\s([0-9]+\s+){4}[0-9]+\.([0-9]+\s+){2}[0-9]+\.[0-9]+/) {
        print "matched\n";
    }
}
__DATA__
 0     0     0      360     0.2   12       9.1

Cheers, Neil

rharsh · Oct 6, 2004

This regex will match the line you gave, but to make it more specific, you'll need to post a little more of the info from your file.

Code:

$_ = '0     0     0      360     0.2   12       9.1';
print if (/^([\d.]+\s*){7}$/)

rharsh · Oct 6, 2004

I didn't notice the white space at the beginning of the line - this should work better:

Code:

/^\s([\d.]+\s*){7}$/

Nebbish · Oct 6, 2004

Code:

$_ = '0     0     0      360     0.2   12       9.1';
my @arrayOfValues = split(/\s+/);

$arrayOfValues[0] = 0,
$arrayOfValues[1] = 0,
$arrayOfValues[2] = 0,
$arrayOfValues[3] = 360,

etc

-Nick

mikevh · Oct 6, 2004

Assuming you only want lines containing 7 numeric elements:

Code:

#!perl
use strict;
use warnings;

while (<DATA>) {
    chomp;
    if ((my @line = m|(-?\d+(?:\.\d+)?)|g) == 7) {
        print join("\t", @line), "\n";
    }
}

__DATA__
 0     0     0      360     0.2   12       9.1
1 2 3 4 5
7 8.2 5 14.75 -1 0 8.9

Output:

Code:

0       0       0       360     0.2     12      9.1
7       8.2     5       14.75   -1      0       8.9

The re says:
1. Optional leading -
2. One or more digits
3. Optional decimal point and one or more digits. The ?: means don't capture the decimal and following digits as a separate field.

PaulTEG · Oct 6, 2004

Stickybit,

it would be nice to come back and recognise these peeplz efforts, so they don't get pi$$ed 0ff, and dance on other people, we;re here to help, not get 91553d on.

If you had an account with me, I'd charge you 4 times, plus a percentage/margin/whatever/they/call/it/now

B nice
--Paul

StickyBit · Oct 6, 2004

Paul,

I always make it a point to thank/recognise those who have helped me. I got tied up with a couple emergencies yesterday otherwise I would have responded sooner…I apologize.

Anyways,

Thank you to everyone who helped me yesterday, my understanding of regex is a little better now. Special thanks to Toolkit for breaking things down into a simpler form and pointing out my mistakes.

I’m working on an inventory project where I have to collect the CPU information for several SUN Servers that display their CPU information as follows:

Data file 1 (4 CPU's)

___DATA___

Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
SYS 0 0 400 4.0 US-II 10.0
SYS 1 1 400 4.0 US-II 10.0
SYS 2 2 400 4.0 US-II 10.0
SYS 3 3 400 4.0 US-II 9.0

___DATA___

or as in my original example:

Data file 2 (1 CPU)

___DATA___

Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 0 0 360 0.2 12 9.1

___DATA___

My immediate requirement was the CPU count, so in the end I used the following regular expressions to count each line in the data file, which worked!

For data file 1

(/^[A-Z]{3}\s{1,}([0-9]{1,}\s{1,}){3}[0-9]{1,}\.[0-9]{1,}\s{1,}[A-Z]{1,}-[A-Z]{1,}\s{1,}[0-9]{1,}\.[0-9]{1,}/)

For data file 2

(/^\s([0-9]{1,}\s{1,}){4}[0-9]{1,}\.[0-9]{1,}\s{1,}[0-9]{1,}\s{1,}[0-9]{1,}\.[0-9]{1,}/)

I’m sure the expression could be more efficient but I thought I would come up with my own work first (based on the examples provided) then look at the answers provided and tweak later…

Thanks again folks!

Stickybit

StickyBit · Oct 7, 2004

Hey guys,

What does the $ at the end of the expression mean?

$_ = '0 0 0 360 0.2 12 9.1';
print if (/^\s([\d.]+\s*){7}$/)

thanks,

stickybit

mikevh · Oct 8, 2004

From perlretut:

The anchor ^ means match at the beginning of the string and the anchor $ means match at the end of the string ...

StickyBit · Oct 8, 2004

thanks mikevh,

Stickybit.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

need regex help again.

StickyBit

Technical User

toolkit

Programmer

toolkit

Programmer

rharsh

Technical User

rharsh

Technical User

Nebbish

Programmer

mikevh

Programmer

PaulTEG

Technical User

StickyBit

Technical User

StickyBit

Technical User

mikevh

Programmer

StickyBit

Technical User

Similar threads

Part and Inventory Search

Sponsor