Thanks toolkit,
I'm afraid a lot of what you said makes no sense to me. Here is the full code and the text that is being parsed.
============================
Code
============================
sub read_ads {
my $rlines = shift;
my %ads; # store 'em all in here
while ($#$rlines > 0) {
# try to recover if file is corrupt (ie. it gets out of sequence)
unless ($rlines->[0] =~ /^\s*\|\s*\d+\s*$/) {
shift @$rlines;
next;
}
# read in two lines
my ($code, $line) = splice(@$rlines, 0, 2);
# get code
next unless $code =~ /^\s*\|\s*(\d+)\s*$/;
$code = $1;
unless (exists $classes->{$code}) {
logline ("Warning: Code \"$code\" not found; skipping line", 1);
next;
}
# get bold & plain parts
next unless $line =~ /^\s*(.*(?:\007f\-b\007))?(.*?)\s*$/s;
my ($bold, $plain) = ($1, $2);
# strip those control codes
$bold = '' unless defined $bold;
for ($bold, $plain) {
s/\007.*?\007//g;
s/(^\s*)|(\s*$)//g;x
$_ = quotechars($_);
}
# join together as needed
if ($bold) {
$line = "<b>$bold</b> $plain<hr>";
}
else {
$line = "$plain<hr>";
}
# add into %ads
$ads{$code} = [] unless exists $ads{$code};
push @{$ads{$code}}, $line;
}
return (\%ads);
===========================
Text
===========================
05000
u=BA7A717Ff+b SMITH o
f+b Carol o
Happy 50th. Lots of Love Pete, Nikki, Steven and Family XXX
=============================
Result
=============================
SMITH (not bolded)
The correct result should be:
<b>SMITH.</b> - Carol. Happy 50th. Lots of Love Pete, Nikki, Steven and Family XXX
Here is an example that outputs correctly:
09000
f+bu=BAC0B39DBAKER. - f-boJohn. Peacefully in Dulwich Hospital, on 7th February 2003, aged 81 years.
Could it be that a full stop/period is missing in the first example and that f and b have a plus sign instead of a minus between them? How can I alter the RegEx to take account of this?