Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Weird regex behavior....

Status
Not open for further replies.

stillflame

Programmer
Jan 12, 2001
416
This isn't a problem, more a question about an accidental solution.

Here's the meat of it:

[tt]m/^*$/[/tt] will match any string. Why is this? i played around, and, in fact, [tt]m/^*/[/tt] will also match any string, but [tt]m/^$/[/tt] will only match a null string, and [tt]m/*$/[/tt] and [tt]m/*/[/tt] both give an error (as they should).

any help is appreciated. "If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."
 
Well the "^" is the beginning anchor and "$" is the ending anchor, so m/^$/ will match only a null string - nothing between the beginning and the ending markers. And a "*" will match 0 (zero) or more occurrences of any character. So m/^*$/ is indeed any string, even the null string.
HTH
 
Why does "*" match 0 or more occurrences of any character? Shouldn't ".*" do that?
Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.
 
Yes, my knowledge says the same thing as tsdragon. '*' only affects how things match, it doesn't match things on it's own. so basically, the '*' is affecting '^', so it's matching 0 or more '^' characters. but '^' is a zero-width assertion, not a character. so there's no way that matching even a million '^'s would advance the index to the second character, much less the end of the string. not to mention that the regex in question has nothing to match the parts inbetween the beginning and the end, which are the only two things that will be matched.
and as i think about it, since '^' is zero width, it could match over and over at the same spot, as many times as you wanted it to, and it would still match. i'm beginning to wonder if zero-width assersions were even meant to be postfixed by '*'s and '+'s.
i'm tempted to crack open the perl code dealing with regular expressions to see if this is a bug in the code, or maybe a default behavior, like maybe if something matches a certain number of times (a bizzilion), the engine just assumes that it matches the pattern. however, i doubt i have a sufficient understanding of the inner workings of perl at this point to do this... "If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."
 
I have some vague memory of reading about the up caret changing its behavior in certain situations. I think I remember that it can be used to negate a match pattern....

found it.

In 'Programming Perl' by Wall, Christiansen, and Schwartz, 2nd edition, page 64,
"A caret at the front of the list causes it to match only characters that are not in the list."

?Maybe the regex engine is interpreting "^*$" to be a caret in front of an empty list. Thus, it negates nothing and matches everything???????


keep the rudder amid ship and beware the odd typo
 
i like the idea, but i tried to verify this, and it doesn't seem to be the case. here's the logic i used: the '^' is being interpretted as the first entry in a character class (square brackets), and would therefore be equivelent to one of the following(?):
[tt]/[^]*$/[/tt]
[tt]/[^*]$/[/tt]
[tt]/[^*$]/[/tt]
the first and the last both produce syntax errors, and the middle will match almost any string, unless the last character is a '*'. if you escape the '$' in the last case, it matches anything except '', '*', and '$'.
well, those are just test cases. the real turning point to me was i decided to take a look at $`, $&, and $'. $& and $' are both completely empty, while $` contains the entire string. thus, it seems that the regex is matching at the end of string, but is matching a zero-width assertion (meaning a '$'), and not anything in the string itself. so, we know now what it matches (anything with an end, i guess), but not why.
actually, though, this regex will also match undefined values. i'm not sure if this means anything.

thanks, goboating, but it's operation is still unknown. i'm gunna ask all the perl gurus/mongers/monks/mages i know, and maybe post it on some mailing list or another. i'll report back if i learn anything.

and, as a matter of note, this strange behavior allows for the matching of anything, and so a regex like:
[tt]/^$word$/[/tt]
will always match when $word = '*', thus treating '*' as a wildcard.

stillflame "If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."
 
(2 minutes later...)
ooo, more on stuff. when the regex is sortened to [tt]/^*/[/tt] the $` variable is empty this time, while the $' variable is the entire string, meaning the regex matches at the beginning of the string instead of the end. now i'm thinking maybe '^*' matches any zero-width assertion....
yea. this code:[tt]
$word = "a1b2c3";
if ($word =~ /1^*b/)
{
print "Match!\n\$'=$'\n\$&=$&\n\$`=$`\n";
}
[/tt]
will show that it matches the "1b"...

i think this case is solved, although it aplication to the real world is almost completely and absolutely absent. s-) "If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."
 
"although its application to the real world is almost completely and absolutely absent"

and that's what I like about it <grin> Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
you know how in scooby doo, after they catch the villian they think they've solved the case, but then Thelma would demask them and tell the real story...

well, i just unmasked this, and it's simpler than i had expected. '*' matches 0 or more, right. well, it's just matching zero '^', followed by a '$'. i should have expected it to be simple. well, now this case is really solved. :) &quot;If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito.&quot;
 
Well I was sorta right and sorta wrong. Sorry 'bout that.

A little more food for the fire &quot;.&quot; can match anything except (usually) a newline or null (Mastering Regular Expressions).

So I guess I'm real crunchy and tasty with mustard.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top