regular expression help

justride · May 22, 2007

Hello all,

I am attempting to search and replace some xml text using php's regular expression library.

I want to find all occurences of >some alpha text<
withing an xml string and replace with ><font...>some alpha text<

here is what I have so far

Code:

pattern = ">[a-zA-Z0-9]*<";
$replace = "><font ...>\\1</font><";
$html = preg_replace($pattern,$replace,$html);

php gives Warning: preg_replace() [function.preg-replace]: No ending delimiter '>' found

but a text editor finds all occurence of >< or >text<

any suggestions?

Thanks

Itshim · May 22, 2007

Change:

Code:

$pattern = ">[a-zA-Z0-9]*<";

to have delimiters; such as:

Code:

$pattern = "/>[a-zA-Z0-9]*</";

Itshim · May 22, 2007

I accidentally hit submit, check out the introduction to Regular Expression Functions (Perl-Compatible), it talks about using delimiters.

justride · May 22, 2007

thanks for the help, ill give it a go

justride · May 23, 2007

Thanks for the help, I am noticing that echo \\1$ prints multiple instances. Is there any way to grab the whole text within >whole text< and assign to \\$1?

Code:

$pattern = "/>(^)*</";

that seems to be the syntax for the pattern but when i use

Code:

$pattern = ">(^)*<";
$replace = "/>test\\$1</";
$html = preg_replace($pattern,$replace,$xml);

I get >testtestest actual data from \\$1<

things like that. I just want to capture everything within >< and replace it.

Thanks

feherke · May 23, 2007

Hi

Code:

$pattern = ">([red][[/red]^[red]<>][/red])*<";

Feherke.

http://rootshell.be/~feherke/

feherke · May 23, 2007

Hi

Oops.

Code:

$pattern = ">([red][[/red]^[red]<>][/red]*)<";

Feherke.

http://rootshell.be/~feherke/

jpadie · May 23, 2007

i may be misunderstanding. from your posts you want to take text withing tags and apply some green coloring to it using the <font> tag.

if so, then this code works for me

Code:

<?
$xml = "lots of text then a <sometag>then lots more text</sometag> then more text <somteag>and now some more</sometag>";

$pattern = "/(>)([a-zA-z0-9 ]*?)(<)/i";
$xml_r = preg_replace($pattern, '$1<font color="green">$2</font>$3',$xml);
echo $xml_r;
?>

but this won't work the way you intend if you have multiple tags within the document. instead you need to refine the pattern as follows:

Code:

$pattern = "/(<.*?>)([a-zA-z0-9 ]*?)(<\/.*?>)/i";

justride · May 23, 2007

yes, I have multiple tags, is that what the *? is doing?

thanks for all the help thus far

jpadie · May 23, 2007

let me decode the pattern for you

Code:

$pattern = "/(<.*?>)([a-zA-z0-9 ]*?)(<\/.*?>)/i";

the forward slash at the beginning and end of the string are pattern delimiters. they tell the engine where to find the pattern and thus where to find the modifiers.

the i at the end is a case insensitive modifier. not strictly needed here.

round brackets cause the data matched by the pattern within them to be captured in what is known as a backreference. the captured data is reusable by referencing it as $n where n is the (1-based) set of round brackets.

so the first round bracket set says:
look for a string that starts with "<" then has any character (the dot) repeated lazily (the ?) none or more times (the *) and then followed by a closing tag ">".
the second round bracket says:
look for a string that contains any alphabetical character or any number or a space (the "or" is created by using the square brackets) which is repeated none or man times; and
the third round bracket says look for a string that starts with a "</" and then contains any old text and finally a ">" the backslash is included because forward slash is a magic character in regex and so, if you want to use it literally, you must escape it. the backslash is the escape character.

together all three round brackets must be satisfied to get a match.

the replace syntax replaces the text within the three round brackets with
(i) the text in the first round bracket ($1) [remember the back references]
(ii) the <font color="green">
(iii) the text in the second round bracket ($2)

etc

FYi lazy vs greedy: a lazy match will stop at the first complete match whereas a greedy match will stop as the last complete match

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

regular expression help

justride

Programmer

Itshim

Programmer

Itshim

Programmer

justride

Programmer

justride

Programmer

feherke

Programmer

feherke

Programmer

jpadie

Technical User

justride

Programmer

jpadie

Technical User

Similar threads

Part and Inventory Search

Sponsor