regex question 3

NEVERSLEEP · Jan 25, 2003

[ignore]lets say i have this simple regex
"aYEPb" to "bYEPa"
$r =~ s/a(.+?)b/b$1a/gi;
now how can i do
"aYEcNOPEcPb" to "bYEPc"

hope im clear,
thanks for the help[/ignore]

<--

---------------------------------------

someone knowledge ends where
someone else knowledge starts

NEVERSLEEP · Jan 25, 2003

doh i just pasted part of my regex so nm the i
$r =~ s/a(.+?)b/b$1a/g;

<-- [morning]

---------------------------------------

someone knowledge ends where
someone else knowledge starts

tanderso · Jan 25, 2003

Do you mean "bYEcNOPEcPb" to "bYEPc"? Because there is no "b" before the "Y" otherwise.
Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.

http://www.oac-design.com

tanderso · Jan 25, 2003

Wait, that doesn't make sense either... I mean "cYEcNOPEcPb" to "bYEPc"?
Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.

http://www.oac-design.com

icrf · Jan 25, 2003

for some reason I don't think this is what you're looking for, but how about this:

Code:

$r =~ s/
a        start pattern
([A-Z]*) saving greedy capitals
([a-z])  a single lower case letter, might want
         to hard code a 'c' if that's what you're after
[A-Z]*   greedy capitals again
\2       the single lower case letter from above
([A-Z]*) saving more greedy capitals
b/b      end match, start subst
$1       first saved set of capitals
$3       second set
$2       the single lower case letter ('c'?)
/g;[code]

copy/pastable:
[code]$r =~ s/a([A-Z]*)([a-z])[A-Z]*\2([A-Z]*)b/b$1$3$2/g;

Hope I'm making sense. Been playing a puzzle/strategy game for three days straight now. Ever hear of Sokoban? I think a version comes with KDE. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

NEVERSLEEP · Jan 25, 2003

ee ok u won u both lost me here LOL

heres the thing, im trying to do custum html (like TGML here)
so lets say the 'data' is
$r = "bold";
$r =~ s/\[b\](.+?)\[\/b\]/$1<\/b>/gi;
# now $r = "bold"
i have a huge set of regex like this
so what im trying to do is
$r = "bold[notag]notag[/notag]bold"
$r =~ ??
# to get $r = "boldnotagbold";

thanks alot

---------------------------------------

someone knowledge ends where
someone else knowledge starts

Wullie · Jan 25, 2003

Hi mate,

I'm hopeless with regex so I'm not even going to attempt to write one for this.

However, I just made a script like this and all I did was replace all instances of with and with , you don't really need to worry about the text in the middle of the tags.

Hope this helps Wullie

http://www.freshlookdesign.co.uk

http://www.freshlookdesign.com

http://www.mailtosanta.co.uk

The pessimist complains about the wind. The optimist expects it to change.
The leader adjusts the sails. - John Maxwell

NEVERSLEEP · Jan 25, 2003

humm i do need to worry
cause the in [notag][/notag]
i dont want them change

$r = "YEP[notag]NOPE[/notag]YEP"
# to
$r = "YEPNOPEYEP"

like the ignore tag here

---------------------------------------

someone knowledge ends where
someone else knowledge starts

Wullie · Jan 25, 2003

Sorry mate,

I didn't read the question properly.

And after using TT for this long, I also didn't even realise that there was an ignore tag. [blush]

Wullie

http://www.freshlookdesign.co.uk

http://www.freshlookdesign.com

http://www.mailtosanta.co.uk

The pessimist complains about the wind. The optimist expects it to change.
The leader adjusts the sails. - John Maxwell

Wullie · Jan 25, 2003

Hi mate,

After reading this post again, am I not correct in saying that you only really need to match:

[notag]whatever here[/notag]

Anything in-between the notag is not parsed but the bold, italic etc outside are parsed. The surrounding bold tags (or whatever other tags) do not need to be considered in this regex?

Hope this helps Wullie

http://www.freshlookdesign.co.uk

http://www.freshlookdesign.com

http://www.mailtosanta.co.uk

The pessimist complains about the wind. The optimist expects it to change.
The leader adjusts the sails. - John Maxwell

NEVERSLEEP · Jan 25, 2003

yep yep
so what i need to do and what im asking is how to heh

take my original regex
$r =~ s/\[b\](.+?)\[\/b\]/$1<\/b>/gi;
and place a condition <-- no clue how
something with ?: i think ..

#totaly made up
$r =~ s/ifnotbetweennotag\[b\](.+?)\[\/b\]/$1<\/b>/gi;

arg i dont get it
someone has info on conditionals on regex thanks ---------------------------------------

someone knowledge ends where
someone else knowledge starts

tanderso · Jan 25, 2003

I'm not sure you can do all that with a single regex. It's going to be rather complicated.
Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.

http://www.oac-design.com

icrf · Jan 25, 2003

This one's really got my attention now. I'm certain that what you're after has crossed the limits of regular languages, so it's going to take some logic on top of the regex(es) to do it. The question is, how little logic can you get away with (a single conditional, perl one-liner?).

It's an interresting idea, I'll work with it some. Language theory is one of those things I really enjoy. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

icrf · Jan 25, 2003

[tt]s/\[$tag\](.*?)(?:\[notag\](.*)\[\/notag\])[red]?[/red](.*?)\[\/$tag\]/\<$tag\>|$1|$2|$3|\<\/$tag\>/g;[/tt]

Now don't get too hopeful. There's a major issue that I'm unable to resolve and I'm not sure if perl's regex engine can do it. The ? in red says there may or may not be a [notags] block in there, but the way perl goes about matching is apparently letting anything that can match nothing do so. I guess I was under the impression before that the "leftmost longest" priority somehow applied to the regex itself, not just the target string.

Anyway, I thought I made the areas in blue non-greedy by adding the ? after the * I also thought that would make the single red ? greedier than either of the blue blocks (could make it at the same level of 'priority' by making it a double ??). Is there any way that can be accomplished?

I'll keep working at it. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

icrf · Jan 26, 2003

You'll have to check this one over, it seemed to work in my first couple test cases...so I haven't broken it, but I haven't tried very hard. See what you can do with her.

Code:

$_ = &quot;\nOne [b] Two [notag] Three [b] Four [b] Five [/notag] Six [/b] Seven\n&quot;;

$tag = 'b';

s/
\[$tag\]
(.*?)		#$1

(?(?!\[\/$tag\])
\[notag\]
(.*)		#$2
\[\/notag\])

(.*?)		#$3
\[\/$tag\]
/\<$tag\>$1$2$3\<\/$tag\>/gx;

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

MikeLacey · Jan 26, 2003

I don't understand why this bit

(.*) #$2

doesn't just match everything to the end of the string
Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.

icrf · Jan 26, 2003

Yeah, upon waking up this morning I decided it too should be a non-greedy (.*?) so note that change.

Even in its current state, it can't match everything to the end of the string because it still requires a closing [/notag] and closing [/$tag], it was just a little greedy. It would match everything until the very last closing [/notag] in the string, instead of the nearest.

Problems I've found so far:

Code:

&quot;\nOne [b] Two [notag] Three [b] Four [/b] Five [/notag] Six [notag] Seven [b] Eight [/b] Nine [/notag] Ten [/b] Eleven\n&quot;

The matching closing [ignore][/ignore] tag for 1-2 is 10-11, but between them is two sets of [notag]'s. I never quite understood how backreferences work with the () are inside something that can be repeated. I don't see offhand how this regex could be changed to allow that, although it sounds simple enough. Something like adding a + to the end of the while conditional (). There'd have to be another (.*?) in there to catch whatever was between the [notag] occurances, but say it matched it twice as it sits now. What's the $2 going to be? The first match? Second? The number of matches? Would it need some embedded code to save what it matches each time and write that out in the end?

Code:

"\nOne [notag] Two [b] Three [/b] Four [/notag] Five\n"

It's only matching embedded [notag]'s between some other tags, not if the set of [notag]'s occurs in bare formatting. I suppose that could be handled easily enough externally. Nesting and matching is generally a difficult thing to do with regex, so it's likely something the user could screw up. Lots o error checking.

That's the limitations I've found this morning with it. Any ideas on how to maybe get around them? ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

NEVERSLEEP · Jan 26, 2003

ok i tryed that it gaved me syntax errors ??
and obviously it didnt work cause of them
[ignore]
$$r = "altered - [notag]not alt ered[/notag]";
$r =~ s/\[b\](.*?)(?(?!\[\/b\])\[notag\](.*?)\[\/notag\])\[\/b\]/$1$2$3<\/b>/gxi;
print $r;[/ignore]

questions (ya remember

heh )

[ignore]

whats the x param ? (for conditions?)
(?(?!\[\/b\])\[notag\](.*?)\[\/notag\])
i dont get this part ..(?(?!)) where can i get info ?
and ya for sure the $1$2$3 will (and prob did) cause the errors

i guess a solution would b
'cut' the data from the notag (getting position)
pass the rest to the normal regexs
then put back the data ..gonna have a position issue i think ...
cause there more complexs ones
example :
$r =~ s/\[glow=(.+?)\](.+?)\[\/glow\]/$2<\/font><\/b>/gi;

[/ignore]

---------------------------------------

someone knowledge ends where
someone else knowledge starts

icrf · Jan 26, 2003

I started off referring to the Perl Black Book (my reference of choice) but when back to the camel book to find my edition is some six years old (amazing the number of things added to regex since perl5 first came out, there were no lookbehind assertions then). Here's something from the good, solid, continually updating perldoc.com resource:

http://www.perldoc.com/perl5.005_03/pod/perlre.html

It goes through everything pretty well.

In short, the x param at the end ignores whitespace and lets you add line comments with # If you explicitly want whitespace, you have to escape it. It just lets you split the regex up into multiple lines, easier readability and commenting.

The (?(?! business is two things:
(?! pattern) is a negative look ahead assertion. It scans ahead to see if what's there would (or in this case, wouldn't, hence the negating !) match. They key is that it doesn't consume what was matched, it leaves it match pointer, or whatever it's called, before that. Positive lookahead is (?= pttn), and +/- lookbehind is (?<=) / (?<!). I think I'm not making much sense here, so check the link, they're better at this than I.

(?(condition)pattern true|pattern false) I'll just quote perldoc on this one: "Conditional expression. (condition) should be either an integer in parentheses (which is valid if the corresponding pair of parentheses matched), or lookahead/lookbehind/evaluate zero-width assertion."

And the syntax errors...well, I will say this: if there is no [notag] block to match, $2 will be empty since its in the conditional block, and since $1 collects everything between the opening [$tag] and opening [notag], $3 is empty as well. I had warnings on, but not the strict pragma, so it spat back "Use of uninitialize value in concatentation or string". I don't think strict would have any problems with this either, now that I think about. Dunno. ----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light

tanderso · Jan 26, 2003

How about something like this:

Code:

my $s = &quot;&quot;; # this will be the final string

# loop through the non-tagged sections
while ($r =~ s/(.*?)\[notag\](.*?)\[\/notag\](.*)/$3/gis)
{
  # append the beginning part to the final string and encode the non-tagged part
  $s .= $1 . encode_html($2);
}

$s .= $r; # add on any final left-over parts

# do the meta-mark-up conversion without converting the non-tagged parts
$s =~ s/\[((?:\/)?(?:b|i|u|a|$othertags))\]/<$1>/gis;

sub encode_html
{
    my ($str) = @_;
    $str =~ s/([^0-9A-Za-z])/sprintf(&quot;&#%d;&quot;,ord($1))/eg;

    return $str;
}

Sincerely,

Tom Anderson
CEO, Order amid Chaos, Inc.

http://www.oac-design.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

regex question 3

Programmer

Programmer

IS-IT--Management

IS-IT--Management

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

Programmer

IS-IT--Management

Programmer

Programmer

Programmer

MIS

Programmer

Programmer

Programmer

IS-IT--Management

Similar threads

Log in

Part and Inventory Search

Sponsor