preg_replace 2

stasJohn · May 26, 2005

I'm not sure if this is the place to ask... it is php related though.

I'm trying to come up with a regular expression. Here's the scenario. I'm pulling addresses from an xml file.

Code:

<addressList>
  <thisIsAnAddress>
  blah blah
  blah blah blah
  </thisIsAnAddress>
  ...
</addressList>

Using xsl, I transform it to...

Code:

<address>
blah blah
blah blah blah
</address>

Of course, when displayed on the screen, both lines appear as one.

So...

Code:

$output = [xsl transformation]
//convert "\n" in <address> tags to <br />
print $output

How do I, using a regular expression, find
"<address>[bunch of text]</address>"
and convert the "\n" to " "s???

Thanks in advance!

kenrbnsn · May 26, 2005

Have you looked at the function nl2br() <

http://www.php.net/nl2br>?

Ken

stasJohn · May 26, 2005

I have, the problem with that is... it will stick 's everywhere.

kenrbnsn · May 26, 2005

So what you want to do is parse your XML, make the changes, and put it back together again.

Take a look at miniXml <

http://minixml.psychogenic.com/index.html>.

This may help you.

Ken

stasJohn · May 27, 2005

No, I'm not actually parsing the xml file.

I want to parse the output from the xsl transformation. Find the "<address></address>" html tags and change any newlines to " "

Lrnmore · May 29, 2005

stasJohn,

I sorta made this my homework for the weekend.
We can do what you require with the preg_replace_callback.

Give this a try:

Code:

$string = <<<END
<address>
You and Yours
Your Home Address
Your Home State
</address>
END;

function ins_BR($arr){
return $arr[1].str_replace("\r\n", "<br />\n",$arr[2]).$arr[3];
}

$pattern = "/(<address>\r\n)+([^<]+)+(<\/address>)/";
echo preg_replace_callback($pattern, "ins_BR", $string);

Hope it's helpful.

Good Place to "Learn More">>
W3 Schools

http://www.w3schools.com

stasJohn · May 31, 2005

Thanks Lrnmore. This works rather nicely. I didn't at first, but once I changed instances of "\r\n" to just "\n", it started to work.

One change I'm trying to make though. I realize I did not make this clear in the beginning. If there are two lines between the address tags, then only the first will have a "br" tag.

Code:

<address>
blah blah <br />
balh balh
</address>
[code]

I tried this...
[code]
$pattern = "/(<address>\n)+([^\n<]+)+(<\/address>)/";

but that makes it so no tags are outputted. Any ideas?

Lrnmore · May 31, 2005

Give this a try, we can tell preg_replace a limit.

Code:

function ins_BR($arr){
$tmp = preg_replace("/\n{1}/", "<br />\n", $arr[2], 1);
return $arr[1].$tmp.$arr[3];
}

$pattern = "/(<address>\n)+([^<]+)+(<\/address>)/";
echo preg_replace_callback($pattern, "ins_BR", $html_string);

stasJohn · May 31, 2005

Ahh, yes that works wonderfully.

So, because of the limit, this means that if the address happened to have three lines, only the first one would get the correct?

Which could probably be fixed by,
- read the value of $arr[0]
- count the number of "\n"'s
- set limit to number of "\n"'s minus 1

Thanks again.

Lrnmore · May 31, 2005

I think you're on track with that.

But $arr[0] is going to be the "whole" match of our pattern.
I believe you'll want count the $arr[2] to get the \n's.

Thanks for the "*".

stasJohn · May 31, 2005

No problem. Thanks for the input and help!!!!!

DRJ478 · May 31, 2005

A few comments:
The regular expression you guys came up with has some extraneous operators you can easily get rid of:

Code:

$old_pattern = "/(<address>\n)+([^<]+)+(<\/address>)/";
$new_pattern = "/(<address>\n)([^<]+)(<\/address>)/";

The plus signs are not needed. It appears to me that they were used as concatenation operators. However, they have a completely different meaning in the regex context. The plus sign is correctly used in the second subpattern, where ([^<]+) the plus sign indicates one or more chars that are not "<"

There is also another way to do the \n replacement without a callback or another regex. The second regular expression could easily be replaced by using the /e modifier and employing the nl2br function within the replacement statement.

stasJohn · May 31, 2005

What is this "e/" modifier you speak of? Sorry, I'm pretty new to the world of regular expressions.

DRJ478 · May 31, 2005

The /e modifier is used within preg_replace and tells the PCRE engine to evaluate the replacement as PHP code. This allows for manipulation of the captured subpatter within the same regex using standard PHP functions.

Code:

preg_replace("/<address>\n([^<]+)<\/address>/e",
             "nl2br('\\1')",
             $html);

The above example captures the part inbetween the <address> tags. In the replacement there is the backreference to the first captured subpattern ('\\1') which is run through the nl2br function.

stasJohn · May 31, 2005

That is a much cleaner solution DRJ478, but it adds one too many 's which is what is going to happen when using nl2br. The last line between the <address> tags should not have a .

Lrnmore · May 31, 2005

DRJ478,

Thanks for the comments, good information.

Looks like the nl2br is also RE driven, how do you place the limit for replacement?

-Mark

DRJ478 · May 31, 2005

You can capture the last occurence of the \n character before the closing tag outside of the subpattern. That will resolve the last \n being substituted by a tag.

Code:

$string = "<address>\nYou and Yours\nYour Home Address\nYour Home State\n</address>";


$string = preg_replace("/<address>\n([^<]+)\n<\/address>/e",
             "nl2br('\\1')",
             $string);
echo($string);

stasJohn · May 31, 2005

Ok, its almost working.

It appears that the xsl transformation is adding some extra spaces...

Code:

<address>
blah blah
blah blah
           </address>

So I want to capture the last "\n" and x number of spaces before the closing tag outside of the subpattern, why is "/<address>\n([^<]+)\n\S*<\/address>/e" not working?

Thanks.

Lrnmore · May 31, 2005

How are you guys seeing this?

For me it completely eliminates the <address> tags.

Code:

$html_string = <<<END
<addressList>
<address>
You and Yours
Your Home Address
Your Home State
</address>
<address>
You and Yours
Your Home Address
Your Home State
</address>
</addressList>
END;

$nwstr = preg_replace("/<address>\r\n([^<]+)\r\n<\/address>/e",
             "nl2br('\\1')",
             $html_string);
echo $nwstr;

DRJ478 · May 31, 2005

stasJohn
\S (uppercase S) means any non space char. Use \s (lowercase s).

Lrnmore
Yes, the tags are elimiated in my expression. But it is easy to just add them in with a subexpression, a few more parantheses.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

preg_replace 2

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

Technical User

Programmer

IS-IT--Management

Programmer

IS-IT--Management

Programmer

Technical User

IS-IT--Management

Programmer

Technical User

IS-IT--Management

Similar threads

Log in

Part and Inventory Search

Sponsor