Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

death to regular expressions! 5

Status
Not open for further replies.

cLFlaVA

Programmer
Jun 14, 2004
6,450
US
Hi all. I have content as follows, entered into a textarea:

Code:
blah blah blah <b>blah</b> blah blah.
<div id="code">
blah blah blah <b>blah</b> blah blah.
</div>
blah blah blah <b>blah</b> blah blah.

And, using, I guess preg_replace or eregi_replace, I'd like to return the following:

Code:
blah blah blah blah blah blah.
<div id="code">
blah blah blah &lt;b&gt;blah&lt;/b&gt; blah blah.
</div>
blah blah blah blah blah blah.

Can someone help me here? Thanks.

*cLFlaVA
----------------------------
[tt]tastes great, less filling.[/tt]
 
The getElementsByTagName() would be the appropriate way to access all <div> tags. Then you can inspect the attributes of each to determine if they are of the variety you want to handle.
As far as using the ID goes: you should be aware that (theoretically) IDs are unique. The way I saw it used in your code is probably coming from CSS usage. To keep IDs unique it's better to apply a class which has no uniqueness requirement - and still your CSS will apply the desired rendering.
 
Yeah, you're right - technically, my sample should have had class="code", not id="code. That was just an oversight in the sample I provided.



*cLFlaVA
----------------------------
[tt]tastes great, less filling.[/tt]
 
Ok, so let me think outloud for a second...

the following text is entered in a textarea:
Code:
blah fjdkf djfkl ;lkadsl uirei <b>kslfdfkdlf</b> ruei wioep ewiureiroei  rioe iroe iroe rieorie opoerip.

[ignore]<div id="code">[/ignore]
stupid blah blah <b>green</b> red dunkin donuts text
[ignore]</div>[/ignore]

blah <em>fjdkf djfkl</em> ;lkadsl uirei <b>kslfdfkdlf</b> ruei wioep ewiureiroei  rioe iroe iroe rieorie opoerip.

I could then user the [tt]loadHTML()[/tt] and [tt]getElementsByTagName()[/tt] and [tt]getAttribute()[/tt] functions to get all divs with a class of "code", and then use [tt]htmlspecialchars()[/tt] to make sure the html code displays on the screen as-is.

So the html would display as follows:

Code:
blah fjdkf djfkl ;lkadsl uirei [b]kslfdfkdlf[/b] ruei wioep ewiureiroei  rioe iroe iroe rieorie opoerip.

stupid blah blah <b>green</b> red dunkin donuts text

blah [i]fjdkf djfkl[/i] ;lkadsl uirei [b]kslfdfkdlf[/b] ruei wioep ewiureiroei  rioe iroe iroe rieorie opoerip.

*cLFlaVA
----------------------------
[tt]tastes great, less filling.[/tt]
 
I've been thinking more about this.
First thought: why use <div> tags id HTML has a designated tag for such conten, namely <code> ... </code>
This would rule out all other <div> tags and make the whole much easier. Instead of declaring <div class="code"> a mere CSS definition of code will suffice.
Then the regex will work fine - as long as no nested <code> tags are present. If so, the "e" modifier will provide the desired functionality for evaluating the expression:
Code:
$text = 'blah fjdkf djfkl ;lkadsl uirei <b>kslfdfkdlf</b> ruei wioep ewiureiroei  rioe iroe iroe rieorie opoerip.

<code>
stupid blah blah <b>green</b> red <code>dunkin</code> donuts text
</code>

blah <em>fjdkf djfkl</em> ;lkadsl uirei <b>kslfdfkdlf</b> ruei wioep ewiureiroei  rioe iroe iroe rieorie opoerip.
';

$pattern = "/(<code>)(.*)(<\/code>)/esi";
$newText = preg_replace($pattern,"'\\1'.htmlspecialchars('\\2').'\\3'",$text);
echo $newText;

The DOM route only needs to be taken if nested tags are there - and then the tags by name of <code> still need to be examined if they have another <code> tag in their genealogy.
Nested <code> tags are painful, so if you can stay away from them, do so.
Hope this helps.
 
I do plan on staying away from nested <code> tags. The main idea is that I'll be writing css / javascript articles, and want to be able to switch from normal text to styled text (basically green courier new "pre" text).

Could you elaborate on "e"?

Thanks a lot :)

*cLFlaVA
----------------------------
[tt]tastes great, less filling.[/tt]
 
The "e" pattern modifier - as in the example above, tells the regex engine to evaluate the replacement as a PHP expression.
In this cas it says (translated into human language):
Code:
"      ->open replacement expression
'//1'  ->content of the first parantheses (it's the <code> tag) it's quoted since it is a string
.      ->concatenate
htmlspecialchars('//2') -> run the htmlspecialchars function on the second match (the content of the tag)
.      -> concatenate
'//3'  -> the third match (the clong </code> tag)
"      -> close replacement expression

The /e just makes it that the whole replacement will be treated just like PHP code with the arguments taken from the match through backreferences.

Clear?
 
ah, instead of thinking that

"'\\1'.htmlspecialchars('\\2').'\\3'"

is the expression you'd like to try to replace it with, since there are double-quotes around the entire thing!

Thanks buddy, you've taught me a lot.

*cLFlaVA
----------------------------
[tt]tastes great, less filling.[/tt]
 
DRJ478 -

This worked as expected. Is there an additional modifier I can use so this is done to all instances of <code></code>?

I thought it was "g" for global, but that gave me an error.

*cLFlaVA
----------------------------
[tt]tastes great, less filling.[/tt]
 
Update:

After trying a lot of options, I finally found the complete regex.

Thanks again to everyone for your continued help.

Code:
$pattern = "/(<code>)(.*?)(<\\/code>)/esi";

*cLFlaVA
----------------------------
[tt]tastes great, less filling.[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top