Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex question 3

Status
Not open for further replies.

NEVERSLEEP

Programmer
Apr 14, 2002
667
CA
[ignore]lets say i have this simple regex
"aYEPb" to "bYEPa"
$r =~ s/a(.+?)b/b$1a/gi;
now how can i do
"aYEcNOPEcPb" to "bYEPc"

hope im clear,
thanks for the help[/ignore]

<--
banghead.gif

withstupid.gif
---------------------------------------
wmail.jpg


someone knowledge ends where
someone else knowledge starts
 
doh i just pasted part of my regex so nm the i
$r =~ s/a(.+?)b/b$1a/g;

<-- [morning] ---------------------------------------
wmail.jpg


someone knowledge ends where
someone else knowledge starts
 
Do you mean &quot;bYEcNOPEcPb&quot; to &quot;bYEPc&quot;? Because there is no &quot;b&quot; before the &quot;Y&quot; otherwise.
Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.
 
for some reason I don't think this is what you're looking for, but how about this:
Code:
$r =~ s/
a        start pattern
([A-Z]*) saving greedy capitals
([a-z])  a single lower case letter, might want
         to hard code a 'c' if that's what you're after
[A-Z]*   greedy capitals again
\2       the single lower case letter from above
([A-Z]*) saving more greedy capitals
b/b      end match, start subst
$1       first saved set of capitals
$3       second set
$2       the single lower case letter ('c'?)
/g;[code]

copy/pastable:
[code]$r =~ s/a([A-Z]*)([a-z])[A-Z]*\2([A-Z]*)b/b$1$3$2/g;

Hope I'm making sense. Been playing a puzzle/strategy game for three days straight now. Ever hear of Sokoban? I think a version comes with KDE. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
ee ok u won u both lost me here LOL

heres the thing, im trying to do custum html (like TGML here)
so lets say the 'data' is
$r = &quot;bold&quot;;
$r =~ s/\[b\](.+?)\[\/b\]/<b>$1<\/b>/gi;
# now $r = &quot;<b>bold</b>&quot;
i have a huge set of regex like this
so what im trying to do is
$r = &quot;bold[notag]notag[/notag]bold&quot;
$r =~ ??
# to get $r = &quot;<b>boldnotagbold</b>&quot;;

thanks alot


---------------------------------------
wmail.jpg


someone knowledge ends where
someone else knowledge starts
 
Hi mate,

I'm hopeless with regex so I'm not even going to attempt to write one for this.

However, I just made a script like this and all I did was replace all instances of with <b> and with </b>, you don't really need to worry about the text in the middle of the tags.

Hope this helps Wullie


The pessimist complains about the wind. The optimist expects it to change.
The leader adjusts the sails. - John Maxwell
 
humm i do need to worry
cause the in [notag][/notag]
i dont want them change

$r = &quot;YEP[notag]NOPE[/notag]YEP&quot;
# to
$r = &quot;<b>YEPNOPEYEP<b>&quot;

like the ignore tag here


---------------------------------------
wmail.jpg


someone knowledge ends where
someone else knowledge starts
 
Hi mate,

After reading this post again, am I not correct in saying that you only really need to match:

[notag]whatever here[/notag]

Anything in-between the notag is not parsed but the bold, italic etc outside are parsed. The surrounding bold tags (or whatever other tags) do not need to be considered in this regex?

Hope this helps Wullie


The pessimist complains about the wind. The optimist expects it to change.
The leader adjusts the sails. - John Maxwell
 
yep yep
so what i need to do and what im asking is how to heh

take my original regex
$r =~ s/\[b\](.+?)\[\/b\]/<b>$1<\/b>/gi;
and place a condition <-- no clue how
something with ?: i think ..

#totaly made up
$r =~ s/ifnotbetweennotag\[b\](.+?)\[\/b\]/<b>$1<\/b>/gi;

arg i dont get it
someone has info on conditionals on regex thanks ---------------------------------------
wmail.jpg


someone knowledge ends where
someone else knowledge starts
 
This one's really got my attention now. I'm certain that what you're after has crossed the limits of regular languages, so it's going to take some logic on top of the regex(es) to do it. The question is, how little logic can you get away with (a single conditional, perl one-liner?).

It's an interresting idea, I'll work with it some. Language theory is one of those things I really enjoy. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
[tt]s/\[$tag\](.*?)(?:\[notag\](.*)\[\/notag\])[red]?[/red](.*?)\[\/$tag\]/\<$tag\>|$1|$2|$3|\<\/$tag\>/g;[/tt]

Now don't get too hopeful. There's a major issue that I'm unable to resolve and I'm not sure if perl's regex engine can do it. The ? in red says there may or may not be a [notags] block in there, but the way perl goes about matching is apparently letting anything that can match nothing do so. I guess I was under the impression before that the &quot;leftmost longest&quot; priority somehow applied to the regex itself, not just the target string.

Anyway, I thought I made the areas in blue non-greedy by adding the ? after the * I also thought that would make the single red ? greedier than either of the blue blocks (could make it at the same level of 'priority' by making it a double ??). Is there any way that can be accomplished?

I'll keep working at it. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
You'll have to check this one over, it seemed to work in my first couple test cases...so I haven't broken it, but I haven't tried very hard. See what you can do with her.
Code:
$_ = &quot;\nOne [b] Two [notag] Three [b] Four [b] Five [/notag] Six [/b] Seven\n&quot;;

$tag = 'b';

s/
\[$tag\]
(.*?)		#$1

(?(?!\[\/$tag\])
\[notag\]
(.*)		#$2
\[\/notag\])

(.*?)		#$3
\[\/$tag\]
/\<$tag\>$1$2$3\<\/$tag\>/gx;
----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
I don't understand why this bit

(.*) #$2

doesn't just match everything to the end of the string
Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.
 
Yeah, upon waking up this morning I decided it too should be a non-greedy (.*?) so note that change.

Even in its current state, it can't match everything to the end of the string because it still requires a closing [/notag] and closing [/$tag], it was just a little greedy. It would match everything until the very last closing [/notag] in the string, instead of the nearest.

Problems I've found so far:
Code:
&quot;\nOne [b] Two [notag] Three [b] Four [/b] Five [/notag] Six [notag] Seven [b] Eight [/b] Nine [/notag] Ten [/b] Eleven\n&quot;
The matching closing [ignore][/ignore] tag for 1-2 is 10-11, but between them is two sets of [notag]'s. I never quite understood how backreferences work with the () are inside something that can be repeated. I don't see offhand how this regex could be changed to allow that, although it sounds simple enough. Something like adding a + to the end of the while conditional (). There'd have to be another (.*?) in there to catch whatever was between the [notag] occurances, but say it matched it twice as it sits now. What's the $2 going to be? The first match? Second? The number of matches? Would it need some embedded code to save what it matches each time and write that out in the end?

Code:
&quot;\nOne [notag] Two [b] Three [/b] Four [/notag] Five\n&quot;
It's only matching embedded [notag]'s between some other tags, not if the set of [notag]'s occurs in bare formatting. I suppose that could be handled easily enough externally. Nesting and matching is generally a difficult thing to do with regex, so it's likely something the user could screw up. Lots o error checking.

That's the limitations I've found this morning with it. Any ideas on how to maybe get around them? ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
eek2.gif


ok i tryed that it gaved me syntax errors ??
and obviously it didnt work cause of them
[ignore]
$$r = &quot;altered - [notag]not alt ered[/notag]&quot;;
$r =~ s/\[b\](.*?)(?(?!\[\/b\])\[notag\](.*?)\[\/notag\])\[\/b\]/<b>$1$2$3<\/b>/gxi;
print $r;[/ignore]

questions (ya remember
withstupid.gif
heh )

[ignore]

whats the x param ? (for conditions?)
(?(?!\[\/b\])\[notag\](.*?)\[\/notag\])
i dont get this part ..(?(?!)) where can i get info ?
and ya for sure the $1$2$3 will (and prob did) cause the errors

i guess a solution would b
'cut' the data from the notag (getting position)
pass the rest to the normal regexs
then put back the data ..gonna have a position issue i think ...
cause there more complexs ones
example :
$r =~ s/\[glow=(.+?)\](.+?)\[\/glow\]/<b><font style=&quot;filter:glow(color=$1, strength=3);height=1px&quot;>$2<\/font><\/b>/gi;

[/ignore]



---------------------------------------
wmail.jpg


someone knowledge ends where
someone else knowledge starts
 
I started off referring to the Perl Black Book (my reference of choice) but when back to the camel book to find my edition is some six years old (amazing the number of things added to regex since perl5 first came out, there were no lookbehind assertions then). Here's something from the good, solid, continually updating perldoc.com resource:
It goes through everything pretty well.

In short, the x param at the end ignores whitespace and lets you add line comments with # If you explicitly want whitespace, you have to escape it. It just lets you split the regex up into multiple lines, easier readability and commenting.

The (?(?! business is two things:
(?! pattern) is a negative look ahead assertion. It scans ahead to see if what's there would (or in this case, wouldn't, hence the negating !) match. They key is that it doesn't consume what was matched, it leaves it match pointer, or whatever it's called, before that. Positive lookahead is (?= pttn), and +/- lookbehind is (?<=) / (?<!). I think I'm not making much sense here, so check the link, they're better at this than I.

(?(condition)pattern true|pattern false) I'll just quote perldoc on this one: &quot;Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero-width assertion.&quot;

And the syntax errors...well, I will say this: if there is no [notag] block to match, $2 will be empty since its in the conditional block, and since $1 collects everything between the opening [$tag] and opening [notag], $3 is empty as well. I had warnings on, but not the strict pragma, so it spat back &quot;Use of uninitialize value in concatentation or string&quot;. I don't think strict would have any problems with this either, now that I think about. Dunno. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
How about something like this:
Code:
my $s = &quot;&quot;; # this will be the final string

# loop through the non-tagged sections
while ($r =~ s/(.*?)\[notag\](.*?)\[\/notag\](.*)/$3/gis)
{
  # append the beginning part to the final string and encode the non-tagged part
  $s .= $1 . encode_html($2);
}

$s .= $r; # add on any final left-over parts

# do the meta-mark-up conversion without converting the non-tagged parts
$s =~ s/\[((?:\/)?(?:b|i|u|a|$othertags))\]/<$1>/gis;

sub encode_html
{
    my ($str) = @_;
    $str =~ s/([^0-9A-Za-z])/sprintf(&quot;&#%d;&quot;,ord($1))/eg;

    return $str;
}
Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top