Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Matching [forum style] tags with regexes 2

Status
Not open for further replies.

Leozack

MIS
Oct 25, 2002
867
GB
Hey guys. I've spent hours today working the impossible trying to do the simple. Normal coding affair.
What I've got is text input where I've chosen to except certain forum styles of formatting, aka
Code:
[heading]This Is A Heading[/heading]
[linebreak]
[b]Bold text here[/b] [i]Italic text here[/i] [u]Underlined text here[/u]
[colour='red']Coloured text here[/colour] (colour from red/blue/yellow/purple/green/orange/white/black)
[size='1']Small text here[/size] (size from 1-10)
[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]
[email='address']Email us[/email] (will goto 'address')
Exactly like the code tags I just used.

This is somewhat like the post over at
Currently I have it working fine for the easy ones etc, but for complicated tags like colour/size/link/email it gets fiddly. I have it working if I replace the first tag seperately from the last, eg
Code:
$thetext = ereg_replace("\[colour='([^']+)']","<font color=\"\\1\">",$thetext);
$thetext = ereg_replace("\[/colour]","</font>",$thetext);
But me being me I stubbornly wanted to do it all in 1 go. But nowhere can I find a way to specify the section in the middle of the pattern (the bits between the tags to be formatted) to be "any char but NOT the closing tag sequence eg [/colour]".

For example, since it's greedy patterning, this will format from the first opening tag to the last closing tag
Code:
$thetext = ereg_replace("\[colour='([^']+)'](.+)\[/colour]","<font color=\"\\1\">\\2</font>",$thetext);
Now I've tried making that crucial middle bit (.+) into all sorts, but I just can't get it to look for the closing tag, thanks to [ and ] being stuff to do with patterns. Typical eh?

Please help! ;_;

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Leozack,

This is a quick rough example, that works with your example text-format.

Code:
<?php

$text = "[heading]This Is A Heading[/heading]";
$text .= "[linebreak]";
$text .= "[ignore][b]Bold text here[/b] [i]Italic text here[/i] [u]Underlined text here[/u][/ignore]";
$text .= "[colour='red']Coloured text here[/colour]";
$text .= "[size='1']Small text here[/size]";
$text .= "[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]";
$text .= "[email='address']Email us[/email]";

$size_conversion = array("size='1" => "size='xx-small",
                         "size='2" => "size='x-small",
                         "size='3" => "size='small",
                         "size='4" => "size='medium",
                         "size='5" => "size='large",
                         "size='6" => "size='x-large",
                         "size='7" => "size='xx-large",
                         "size='8" => "size='smaller",
                         "size='9" => "size='larger"
                        );


$replacement_list = array("heading" => "h3", 
                          "b" => "b",
                          "i" => "i",
                          "u" => "u",
                          "linebreak" => "br",
                          "colour='" => "font style='color:",
                          "/colour" => "/font",
                          "size='" => "font style='font-size:",
                          "/size" => "/font",
                          "link" => "a href",
                          "/link" => "/a",
                          "email='" => "a href='mailto:",
                          "/email" => "/a"
                         );



foreach($size_conversion as $replace => $with)
  $text = str_replace("[$replace", "[$with", $text);
foreach($replacement_list as $replace => $with)
  $text = str_replace("[$replace", "<$with", $text);
foreach($replacement_list as $replace => $with)
  $text = str_replace("[/$replace", "</$with", $text);
$text = str_replace("]", ">", $text);

echo htmlentities($text)."<hr />";

echo $text;

?>

Regards
 
Hi dkdude that's certainly an interesting way to do it. Bit of a workaround though :> And unfortunately the email links I'm making are fiddly ones with javascript - originally I had just search/replaced strings including ] for > and so on, but the email one meant I couldn't just do that as it needed a specific ending on the first tag, meaning I had to operate on the first tag including the email address as a whole, meaning I needed a pattern amtch not a string match. Queue me spending hours on the regexes :p
Currently I'm not crying since I've got it working fine by sticking with the method of matching the front tags seperately to the end tags. But the profressional in me isn't happy with such a ... settlement. I'd still have smeone who knows their regex to point out what I'm sure must be a simple way to say
(start tag)(anystuff that isn't start tag end)(any stuff that isn't end tag)(end tag)
The problem I think is mainly due to my tags involving [ & ] :(

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Heh - yeah, it sure is a workaround :) ... but an quite an interesting challange as well. You've got me started making a class now ... stay tuned :)

Also I look forward to see the ereg_replace() way of doing it.

Regards
 
I'm using something similar to parse an email body for links and modify them to point to break out of a frame (being displayed in an iframe).

I'm sure it could be modified to fit this:

Code:
	$reg = "'(</?a )([^>]*)?(href=[\"\']?)([^\'\">]*)([\"\']?[^>]*>)([^<]*)(</a>)'im";
	$body = preg_replace($reg, "$1$2$3javascript:top.location='$4'$5$6$7", $body);
 
Hey Borvik I'm sure it could be modified ... if I could actually decode it :p

(</?a ) look for opening tags? ? means lazily or optional? and thea means ... no idea

([^>]*)? anything that's not a closing >. ? means lazily?
(href=[\"\']?) look for a href=" or href=' ? lazily again or optional?

([^\'\">]*) anything that isn't ' or " or >

([\"\']?[^>]*>) either " or ' (? = optional?) followed by stuff that isn't > followed by >

([^<]*) something that isn't <

(</a>)im A closing link tag. Not sure about the i or m modifiers

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
I'll add to anyone wanting this to work for themselves, that it DOES work already if you just do it with 2 patterns instead of 1, using
Code:
$thetext = ereg_replace("\[colour='([^']+)']","<font color=\"\\1\">",$thetext);
$thetext = ereg_replace("\[/colour]","</font>",$thetext);
And also add that the reason this is actually HARDER than search/replacing for <html> style tags is because [forum] style tags use [ and ] which patterns use for character sets :(

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Here's one that works just change the email/colour/link text in the regex definintion lines to get the values you want $4 will be what is in quotes (the value of the link) and $6 will be the value between the brackets:

Code:
<?php
$text = "[heading]This Is A Heading[/heading]<br>\n";
$text .= "[linebreak]<br>\n";
$text .= "[b]Bold text here[/b] [i]Italic text here[/i] [u]Underlined text here[/u]<br>\n";
$text .= "[colour='red']Coloured text here[/colour]<br>\n";
$text .= "[size='1']Small text here[/size]<br>\n";
$text .= "[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]<br>\n";
$text .= "[email='address']Email us[/email]<br>\n";

$reg = "#(\[/?colour)([^\]]*)?(=[\"\']?)([^\'\"]*)([\"\']?[^]]*\])([^\[]*)(\[/colour\])#im";
$text = preg_replace($reg, "<font color=\"$4\">$6</font>", $text);
$reg = "#(\[/?size)([^\]]*)?(=[\"\']?)([^\'\"]*)([\"\']?[^]]*\])([^\[]*)(\[/size\])#im";
$text = preg_replace($reg, "<font size=\"$4\">$6</font>", $text);
$reg = "#(\[/?link)([^\]]*)?(=[\"\']?)([^\'\"]*)([\"\']?[^]]*\])([^\[]*)(\[/link\])#im";
$text = preg_replace($reg, "<a href=\"$4\">$6</a>", $text);
$reg = "#(\[/?email)([^\]]*)?(=[\"\']?)([^\'\"]*)([\"\']?[^]]*\])([^\[]*)(\[/email\])#im";
$text = preg_replace($reg, "<a href=\"mailto: $4\">$6</a>", $text);
//$text = preg_replace($reg, "found", $text);

echo $text;
?>

I'm not entire sure what everything means perse(I found it from someone else and modified it).
 
here is something i've just tried out

Code:
<?

$string = <<<STR
[heading]This Is A Heading[/heading]
[b]Bold text here[/b] 
[i]Italic text here[/i] 
[ul]Underlined text here[/ul]
[colour='red']Coloured text here[/colour] (colour from red/blue/yellow/purple/green/orange/white/black)
[size='1']text here[/size] (size from 1-10)
[size='2']text here[/size]
[size='3']text here[/size]
[size='4']text here[/size]
[size='5']text here[/size]
[size='6']text here[/size]
[size='7']text here[/size]
[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]
[email='address']Email us[/email] (will goto 'address')

STR;



$bbcode = array (
		'/(\[b\])(.+)(\[\/b\])/',	
		'/(\[link=)(.+)(\])(.+)(\[\/link\])/',		
		'/(\[heading\])(.+)(\[\/heading\])/',	
		'/(\[ul\])(.+)(\[\/ul\])/',	
		'/(\[colour=)(.+)(\])(.+)(\[\/colour\])/',		
		'/(\[email=)(.+)(\])(.+)(\[\/email\])/',
		'/(\[size=)(.+)(\])(.+)(\[\/size\])/',
		'/(\[i\])(.+)(\[\/i\])/'	
		);
$html = array (
		'<b>\\2</b>',
		'<a href=\\2>\\4</a>',
		'<h1>\\2</h1>',
		'<u>\\2</u>',		
		'<font color=\\2>\\4</font>',
		'<a href="mailto:\\2">\\4</a>',
		'<font size=\\2>\\4</font>',
		'<i>\\2</i>'				
		);
$string = preg_replace($bbcode, $html, $string);
print nl2br($string);

?>
 
the string is reposted here without mark up

$string = <<<STR
This Is A Heading​
Bold text here
Italic text here
[ul]Underlined text here[/ul]
[colour='red']Coloured text here[/colour] (colour from red/blue/yellow/purple/green/orange/white/black)
text here (size from 1-10)
text here
text here
text here
text here
text here
text here
[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]
[email='address']Email us[/email] (will goto 'address')

STR;
 
Thanks bovrik that seems to work, tohugh it's using preg I note not ereg. Surprisingly I was trying preg and having problems then I changed it to ereg and they started working. Probably as I didn't enclose the pattern in /'s. Doh!
jpadie - it turns out the patterns I was working on were VERY similar to those ones. I think I'll use your ones since they're shorter and easier to understand than the ones you mentioned bovrik, and from what I can see they both handle a situation of meeting a tag within a tag (eg a bold tag) without breaking the match so the whole thing stil gets coloured, yet don't colour from the first opening tag to the last closing tag. I couldn't write a book on WHY, but they work. I've modifed them slightly eg
Code:
'/(\[colour=)(.+)(\])(.+)(\[\/colour\])/'
becomes
"/(\[colour=')([^'\]]+)('\])(.+)(\[\/colour\])/"
I'm not sureI need to specify in the 2nd group that it's charthing EXCEPT '], seeing as the 3rd group requires a '], but I've done it anyway and it seems to still be happy. I might use the array style too. Stars ahoy!

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
In the interests of safety I tried to turn the "anything" section of
/(\[colour=')([^'\]]+)('\])(.+)(\[\/colour\])/
into "anythign that isn't the closing tag"
([^\[\/colour\]]+)
aka
/(\[colour=')([^'\]]+)('\])([^\[\/colour\]]+)(\[\/colour\])/
but it doesn't work. So I'll leave it be I guess o.o

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
I must confess, that out of all these, for some reason I can't explain, my [ignore]
[/ignore] tags aren't working ...
Code:
[ignore]
	$thetext = preg_replace("/(\[heading\])(.+)(\[\/heading\])/","<span class=\"hsmall\">$2</span>\r\n<br />\r\n",$thetext);
	$thetext = preg_replace("/(\[linebreak\])/","<br />\r\n",$thetext);
	$thetext = preg_replace("/(\[b\])(.+)(\[\/b\])/","<b>$2</b>",$thetext);
	$thetext = preg_replace("/(\[i\])(.+)(\[\/i\])/","<i>$2</i>",$thetext);
	$thetext = preg_replace("/(\[u\])(.+)(\[\/u\])/","<u>$2</u>",$thetext);
	$thetext = preg_replace("/(\[center\])(.+)(\[\/center\])/","<center>$2</center>",$thetext);
	$thetext = preg_replace("/(\[colour=')([^'\]]+)('\])(.+)(\[\/colour\])/","<font color=\"$2\">$4</font>",$thetext);
	$thetext = preg_replace("/(\[size=')([^'\]]+)('\])(.+)(\[\/size\])/","<font size=\"$2\">$4</font>",$thetext);
	$thetext = preg_replace("/(\[link=')([^'\]]+)('\])(.+)(\[\/link\])/","<a href=\"$2\">$4</a>",$thetext);
	$thetext = preg_replace("/(\[email=')([^'\]]+)('\])(.+)(\[\/email\])/","<a onmouseover=\"window.status='Email Us!'; return true\" onmouseout=\"window.status=''; return true\"  href=\"javascript:eMail('$2');\">$4</a>",$thetext);
[/ignore]

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
I copied what you had there for $thetext and used it with this:

Code:
<?php
$text = "[heading]This Is A Heading[/heading]<br>\n";
$text .= "[linebreak]<br>\n";
$text .= "[b]Bold text here[/b] [i]Italic text here[/i] [u]Underlined text here[/u]<br>\n";
$text .= "[colour='red']Coloured text here[/colour]<br>\n";
$text .= "[size='1']Small text here[/size]<br>\n";
$text .= "[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]<br>\n";
$text .= "[email='address']Email us[/email]<br>\n";
$text .= "[linebreak]<br>\n";
$text .= "[b]Bold text here[/b] [i]Italic text here[/i] [u]Underlined text here[/u]<br>\n";
$text .= "[colour='red']Coloured text here[/colour]<br>\n";
$text .= "[size='1']Small text here[/size]<br>\n";
$text .= "[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]<br>\n";
$text .= "[email='address']Email us[/email]<br>\n";
$text .= "[linebreak]<br>\n";
$text .= "[b]Bold text here[/b] [i]Italic text here[/i] [u]Underlined text here[/u]<br>\n";
$text .= "[colour='red']Coloured text here[/colour]<br>\n";
$text .= "[size='1']Small text here[/size]<br>\n";
$text .= "[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]<br>\n";
$text .= "[email='address']Email us[/email]<br>\n";
$text .= "[linebreak]<br>\n";
$text .= "[b]Bold text here[/b] [i]Italic text here[/i] [u]Underlined text here[/u]<br>\n";
$text .= "[colour='red']Coloured text here[/colour]<br>\n";
$text .= "[center]Centered text[/center]<br>\n";
$text .= "[size='1']Small text here[/size]<br>\n";
$text .= "[link='[URL unfurl="true"]http://www.test.com'[/URL]]Click here[/url]<br>\n";
$text .= "[email='address']Email us[/email]<br>\n";

$thetext = $text;

 $thetext = preg_replace("/(\[heading\])(.+)(\[\/heading\])/","<span class=\"hsmall\">$2</span>\r\n<br />\r\n",$thetext);
$thetext = preg_replace("/(\[linebreak\])/","<br />\r\n",$thetext);
$thetext = preg_replace("/(\[b\])(.+)(\[\/b\])/","<b>$2</b>",$thetext);
$thetext = preg_replace("/(\[i\])(.+)(\[\/i\])/","<i>$2</i>",$thetext);
$thetext = preg_replace("/(\[u\])(.+)(\[\/u\])/","<u>$2</u>",$thetext);
$thetext = preg_replace("/(\[center\])(.+)(\[\/center\])/","<center>$2</center>",$thetext);
$thetext = preg_replace("/(\[colour=')([^'\]]+)('\])(.+)(\[\/colour\])/","<font color=\"$2\">$4</font>",$thetext);
$thetext = preg_replace("/(\[size=')([^'\]]+)('\])(.+)(\[\/size\])/","<font size=\"$2\">$4</font>",$thetext);
$thetext = preg_replace("/(\[link=')([^'\]]+)('\])(.+)(\[\/link\])/","<a href=\"$2\">$4</a>",$thetext);
$thetext = preg_replace("/(\[email=')([^'\]]+)('\])(.+)(\[\/email\])/","<a onmouseover=\"window.status='Email Us!'; return true\" onmouseout=\"window.status=''; return true\" href=\"javascript:eMail('$2');\">$4</a>",$thetext);

echo $thetext;
?>

It worked just fine for me - it might be something in $thetext variable.
 
My current text is this
Code:
[ignore]
Hey there
[heading]Nice heading![/heading]
What?[linebreak]New line please!
[b]bold mate [i]bold italics?[/i] ok[/b]
[u]underlines ftw![/u]
[center]Love the centering, bob
[colour='yellow']ew[u]w yel[/u]low[/colour][/center]
[size='10']wassaaaap[/size]
[link='index.php']Home page[/url]
[email='testing']Email ok?[/email]
[linebreak]
round 2!
[heading]Nice heading![/heading]
What?[linebreak]New line please!
[b]bold mate [i]bold italics?[/i] ok[/b]
[u]underlines ftw![/u]
[center]Love the centering, bob
[colour='yellow']eww yellow[/colour][/center]
[size='10']wassaaaap[/size]
[link='index.php']Home page[/url]
[email='testing']Email ok?[/email]
[/ignore]
Finding problems with it is exactly why I can't always do the easiest option. But it really should work fine only the centering is broken x_X

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
It definately has something to do with spanning multiple lines, which is what the "im" modifier is for (case-Insensitive and Multi-line) - but it doesn't appear to be working...

Hmm... I'm confused now, some testing on my end got a multi-line b tag to work (after some tweaking to the regex namely replacing (.+) with ([^\[]*)? ), but only when it appeared after the other preg_replace statements. When I moved the modified preg_replace to the spot where you had the b tag being changed it failed to work properly.

I'd better take a break now - my brain is starting to cook...
 
It seems if I leave things as they are but change the text to have a row with just a simple center tag, the simple center tag works while the others don't. Yet if I put a newline halfway through a simple center tag, it doesn't work anymore. this remains despite adding "im" to the end of the pattern as you say :(

Code:
[ignore]
Hey there
[heading]Nice heading![/heading]
What?[linebreak]New line please!
[b]bold mate [i]bold italics?[/i] ok[/b]
[u]underlines ftw![/u]
[center]Love the centering, bob
[colour='yellow']ew[u]w yel[/u]low[/colour][/center]
[size='10']wassaaaap[/size]
[link='index.php']Home page[/url]
[email='testing']Email ok?[/email]
[linebreak]
round 2!
[heading]Nice heading![/heading]
What?[linebreak]New line please!
[b]bold mate [i]bold italics?[/i] ok[/b]
[u]underlines ftw![/u]
[center]Love the centering, bob
[colour='yellow']eww yellow[/colour][/center]
[size='10']wassaaaap[/size]
[link='index.php']Home page[/url]
[email='testing']Email ok?[/email]
[/ignore]

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Ok so it's officially a multiline issue, since the momentI made a bold tag span a line that broke also. So! Back to the regexexexexperts :p Multiline solutions to patterns such as
Code:
/(\[center\])(.+)(\[\/center\])/

_________________________________
Leozack
Code:
MakeUniverse($infinity,1,42);
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top