Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

A bit of a regex problem 2

Status
Not open for further replies.

audiopro

Programmer
Apr 1, 2004
3,165
GB
I am using a regex to substitute dates appearing in strings to superscript.
The standard format of the strings is a name followed by a period and a date.
ie.
John Smith.1967
Joe Doe.1945
Alan Evans.1943

This works fine although I am sure some of you guys can achieve the same results with a one liner.

Code:
1>	if(($Valid == 0) && ($results[$x] =~ m/\.1/)){
2>		for($k=1800; $k<2030; ++$k){
3>			$DotDate=".$k";
4>			if($results[$x] =~ m/\.$k/){
5>				$results[$x] =~ s/$DotDate/<sup>($k)<\/sup>/g;
6>			}
7>		}
8>	}

The problem I have is that when line 5 converts a date, if the same date exists in the string, without the period, that is converted too.
I maybe missing something but the routine does not appear to be converting $DotDate (ie .1950) but just 1950.
I wish I had used a colon instead of a period but I am sure there is a solution to this.

Any help greatly appreciated

Keith
 
Hi

I only analyzed your code in big steps, but if there you are performing 230 regular expression matches to replace all year in the 1800..2030 range, then I would strongly suggest to not do that.

Personally I prefer a more single generic regular expression which may match false positives too, then check the found number whether is in range or not :
Perl:
$str=[green][i]'
John Smith.1967 something without dot 2014
Joe Doe.1945 something with dot.0666 but too small
Alan Evans.1943 something else
'[/i][/green];

$str =~ [b]s[/b][fuchsia]!\.(\d{4})!$1~~[1800..2030]?".<sup>$1</sup>":$&![/fuchsia]ge;

[b]print[/b] $str;
Code:
John Smith.<sup>1967</sup> something without dot 2014
Joe Doe.<sup>1945</sup> something with dot.0666 but too small
Alan Evans.<sup>1943</sup> something else

Of course, if I misunderstood your goal, please clarify with more examples.

Feherke.
feherke.ga
 
As feherke has already stated, you should simplify your logic by taking advantage of the fact that you can execute code in the right side of a regex to selectively replace things. This is pretty close to what he's provided with just a couple small differences:

Code:
use strict;
use warnings;

my $data = do {local $/; <DATA>};

$data =~ s{\.(\d{4})\b}{
	($1 >= 1800 && $1 < 2030)
	? "<sup>$1</sup>"
	: $&
}ge;

print $data;

__END__
John Smith.1967
Joe Doe.1945
Alan Evans.1943
Too Much.2040
Too few.1776
no dot 2010
just right.2014

Outputs

Code:
John Smith<sup>1967</sup>
Joe Doe<sup>1945</sup>
Alan Evans<sup>1943</sup>
Too Much.2040
Too few.1776
no dot 2010
just right<sup>2014</sup>

- Miller
 
IMHO this is not a task for regexes. Split is much faster and cleaner (and readable) for such a simple task.
Code:
if($Valid==0){
  my($name,$year)=split/\./,$results[$x];
  $results[$x]=$name.'<sup>'.$year.'</sup>'if$year>=1800&&$year<2030;
}
If the format of the string is incorrect (e.g. no period or 2 periods) the resulting string is unchanged.

: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
WOW! Thanks very much, that is most impressive.
I have used regex for many years but obviously I have only been scratching the surface with regards to their capabilities.
This is the first time I have needed to do this kind of thing and saw no reason to learn such complexities.

Am I right in my view of how it works?

Code:
s

Everything between !'s is the search term, in this case a period and 4 decimal columns
[b]!\.(\d{4})![/b]

for $1 = 1800 to 2030
[b]$1~~[1800..2030][/b]

If match print the substituted string (I removed the period from the result)
[b]?"<sup>$1</sup>"[/b]

This bit - not sure of
[b]:$&!ge;[/b]
I know global and evaluate right side but what does :$&! do?

Keith
 
I couldn't use split as many records contain more than one person's name, associated date and other information.
ie.
Three generations of the family, Joe Doe.1946 with his Father John.Doe.1924 and young Jill Doe.1968.

Keith
 
Hi

Keith said:
I know global and evaluate right side but what does :$&! do?
Well, the colon ( [tt]:[/tt] ) is part of the ternary operator ( [tt]?:[/tt] ), the [tt]$&[/tt] ( match variable ) holds the entire substring matched by the last regular expression matching ( regardless whether groups were captured or not ) and the exclamation mark ( [tt]![/tt] ) is part of the substitution operator's ( [tt]s///[/tt] ) delimiters.

By the way,
Keith said:
[tt]
for $1 = 1800 to 2030
$1~~[1800..2030][/tt]
Actually that usage of the smartmatch operator ( [tt]~~[/tt] ) is more like a [tt]grep[/tt], not just a [tt]for[/tt].


Feherke.
feherke.ga
 
As of Perl 5.18, smart match is experimental: "It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended."

- Miller
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top