Non-programmer needs help deciphering a RegEx

zoecutting · Feb 12, 2003

Hi,

I've been left with a Perl script to maintain and I know nothing about the language. Something has gone wrong somewhere and I'm pretty sure I've tracked it down to this Regular Expression:

# get bold & plain parts
next unless $line =~ /^\s*(.*(?:\007f\-b\007))?(.*?)\s*$/s;
my ($bold, $plain) = ($1, $2);

Is there any kind soul out there willing to help me decipher it?

toolkit · Feb 12, 2003

You can use the 'x' modifier to make regular expressions easier to read:

Code:

unless $line =~ /^\s*      # zero or more whitespaces
                (          # begin grouping 1
                .*         # zero or more characters
                (?:        # do not save to grouping
                \007       # the character with octal value \007 (BEL)
                f          # a literal f
                \-         # a needlessly escaped -
                b          # a literal b
                \007       # the character with octal value \007 (BEL)
                ))?(.*?)\s*$   # other stuff
                /sx;

Have you got an example of what it is trying to parse? Note that \007f is the BEL character (octal 007) followed by a literal 'f', but 0x007f is the DEL character (hexadecimal 007f).
Hope some of this helps..

zoecutting · Feb 12, 2003

Thanks toolkit,

I'm afraid a lot of what you said makes no sense to me. Here is the full code and the text that is being parsed.

============================
Code
============================

sub read_ads {

my $rlines = shift;
my %ads; # store 'em all in here

while ($#$rlines > 0) {
# try to recover if file is corrupt (ie. it gets out of sequence)
unless ($rlines->[0] =~ /^\s*\|\s*\d+\s*$/) {
shift @$rlines;
next;
}

# read in two lines
my ($code, $line) = splice(@$rlines, 0, 2);

# get code
next unless $code =~ /^\s*\|\s*(\d+)\s*$/;
$code = $1;

unless (exists $classes->{$code}) {
logline ("Warning: Code \"$code\" not found; skipping line", 1);
next;
}

# get bold & plain parts
next unless $line =~ /^\s*(.*(?:\007f\-b\007))?(.*?)\s*$/s;
my ($bold, $plain) = ($1, $2);

# strip those control codes
$bold = '' unless defined $bold;
for ($bold, $plain) {
s/\007.*?\007//g;
s/(^\s*)|(\s*$)//g;x
$_ = quotechars($_);
}

# join together as needed
if ($bold) {
$line = "<b>$bold</b> $plain<hr>";
}
else {
$line = "$plain<hr>";
}

# add into %ads
$ads{$code} = [] unless exists $ads{$code};
push @{$ads{$code}}, $line;
}

return (\%ads);

===========================
Text
===========================

05000
u=BA7A717Ff+b SMITH o
f+b Carol o
Happy 50th. Lots of Love Pete, Nikki, Steven and Family XXX

=============================
Result
=============================

SMITH (not bolded)

The correct result should be:

<b>SMITH.</b> - Carol. Happy 50th. Lots of Love Pete, Nikki, Steven and Family XXX

Here is an example that outputs correctly:

09000
f+bu=BAC0B39DBAKER. - f-boJohn. Peacefully in Dulwich Hospital, on 7th February 2003, aged 81 years.

Could it be that a full stop/period is missing in the first example and that f and b have a plus sign instead of a minus between them? How can I alter the RegEx to take account of this?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Non-programmer needs help deciphering a RegEx

zoecutting

MIS

toolkit

Programmer

zoecutting

MIS

Similar threads

Part and Inventory Search

Sponsor