Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

preg_replace 2

Status
Not open for further replies.

stasJohn

Programmer
May 6, 2004
155
US
I'm not sure if this is the place to ask... it is php related though.

I'm trying to come up with a regular expression. Here's the scenario. I'm pulling addresses from an xml file.
Code:
<addressList>
  <thisIsAnAddress>
  blah blah
  blah blah blah
  </thisIsAnAddress>
  ...
</addressList>

Using xsl, I transform it to...
Code:
<address>
blah blah
blah blah blah
</address>

Of course, when displayed on the screen, both lines appear as one.

So...
Code:
$output = [xsl transformation]
//convert "\n" in <address> tags to <br />
print $output

How do I, using a regular expression, find
"<address>[bunch of text]</address>"
and convert the "\n" to "<br />"s???

Thanks in advance!
 
I have, the problem with that is... it will stick <br/>'s everywhere.
 
No, I'm not actually parsing the xml file.

I want to parse the output from the xsl transformation. Find the "<address></address>" html tags and change any newlines to "<br />"

 
stasJohn,

I sorta made this my homework for the weekend.
We can do what you require with the preg_replace_callback.

Give this a try:
Code:
$string = <<<END
<address>
You and Yours
Your Home Address
Your Home State
</address>
END;

function ins_BR($arr){
return $arr[1].str_replace("\r\n", "<br />\n",$arr[2]).$arr[3];
}

$pattern = "/(<address>\r\n)+([^<]+)+(<\/address>)/";
echo preg_replace_callback($pattern, "ins_BR", $string);

Hope it's helpful.

Good Place to "Learn More">>
W3 Schools
 
Thanks Lrnmore. This works rather nicely. I didn't at first, but once I changed instances of "\r\n" to just "\n", it started to work.

One change I'm trying to make though. I realize I did not make this clear in the beginning. If there are two lines between the address tags, then only the first will have a "br" tag.
Code:
<address>
blah blah <br />
balh balh
</address>
[code]

I tried this...
[code]
$pattern = "/(<address>\n)+([^\n<]+)+(<\/address>)/";
but that makes it so no <br> tags are outputted. Any ideas?
 
Give this a try, we can tell preg_replace a limit.

Code:
function ins_BR($arr){
$tmp = preg_replace("/\n{1}/", "<br />\n", $arr[2], 1);
return $arr[1].$tmp.$arr[3];
}

$pattern = "/(<address>\n)+([^<]+)+(<\/address>)/";
echo preg_replace_callback($pattern, "ins_BR", $html_string);
 
Ahh, yes that works wonderfully.

So, because of the limit, this means that if the address happened to have three lines, only the first one would get the <br> correct?

Which could probably be fixed by,
- read the value of $arr[0]
- count the number of "\n"'s
- set limit to number of "\n"'s minus 1


Thanks again.
 
I think you're on track with that.

But $arr[0] is going to be the "whole" match of our pattern.
I believe you'll want count the $arr[2] to get the \n's.

Thanks for the "*".
 
A few comments:
The regular expression you guys came up with has some extraneous operators you can easily get rid of:
Code:
$old_pattern = "/(<address>\n)+([^<]+)+(<\/address>)/";
$new_pattern = "/(<address>\n)([^<]+)(<\/address>)/";
The plus signs are not needed. It appears to me that they were used as concatenation operators. However, they have a completely different meaning in the regex context. The plus sign is correctly used in the second subpattern, where ([^<]+) the plus sign indicates one or more chars that are not "<"

There is also another way to do the \n replacement without a callback or another regex. The second regular expression could easily be replaced by using the /e modifier and employing the nl2br function within the replacement statement.
 
What is this "e/" modifier you speak of? Sorry, I'm pretty new to the world of regular expressions.
 
The /e modifier is used within preg_replace and tells the PCRE engine to evaluate the replacement as PHP code. This allows for manipulation of the captured subpatter within the same regex using standard PHP functions.
Code:
preg_replace("/<address>\n([^<]+)<\/address>/e",
             "nl2br('\\1')",
             $html);

The above example captures the part inbetween the <address> tags. In the replacement there is the backreference to the first captured subpattern ('\\1') which is run through the nl2br function.
 
That is a much cleaner solution DRJ478, but it adds one too many <br>'s which is what is going to happen when using nl2br. The last line between the <address> tags should not have a <br>.
 
DRJ478,

Thanks for the comments, good information.

Looks like the nl2br is also RE driven, how do you place the limit for replacement?

-Mark
 
You can capture the last occurence of the \n character before the closing tag outside of the subpattern. That will resolve the last \n being substituted by a <br /> tag.

Code:
$string = "<address>\nYou and Yours\nYour Home Address\nYour Home State\n</address>";


$string = preg_replace("/<address>\n([^<]+)\n<\/address>/e",
             "nl2br('\\1')",
             $string);
echo($string);
 
Ok, its almost working.

It appears that the xsl transformation is adding some extra spaces...
Code:
<address>
blah blah
blah blah
           </address>

So I want to capture the last "\n" and x number of spaces before the closing tag outside of the subpattern, why is "/<address>\n([^<]+)\n\S*<\/address>/e" not working?

Thanks.
 
How are you guys seeing this?

For me it completely eliminates the <address> tags.
Code:
$html_string = <<<END
<addressList>
<address>
You and Yours
Your Home Address
Your Home State
</address>
<address>
You and Yours
Your Home Address
Your Home State
</address>
</addressList>
END;

$nwstr = preg_replace("/<address>\r\n([^<]+)\r\n<\/address>/e",
             "nl2br('\\1')",
             $html_string);
echo $nwstr;
 
stasJohn
\S (uppercase S) means any non space char. Use \s (lowercase s).

Lrnmore
Yes, the tags are elimiated in my expression. But it is easy to just add them in with a subexpression, a few more parantheses.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top