Regular expression for preg_match problem

JGH · Jan 5, 2006

I am trying to pluck out text found within the outermost brackets %[ and ] (the first bracket preceded by a % sign). I want to be able to next %[...] within each other, if necessary. I thought that the following would do the job, but alas my regex shortcomings have been exposed.

Code:

if (preg_match("/^(^%\[)*%\[(.+)\](^\])*$/",$line,$matches)) { ... }

Any idea what I am doing wrong?

Thanks - Jim

jpadie · Jan 6, 2006

sorry JGH, can't help with the regex but here is a short code snip that achieves the same thing (may actually be quicker than a regex)

Code:

<?
$haystack = '%[hello there %[something in here] something more in here] then more text %[and finally] and then some more text that should not be included';
$startneedle = "%[";
$stopneedle = "]";
$pos = -1;
$i=0;
while (($pos=strpos($haystack,$startneedle,$pos+1))!==false) $start_array[$i++]=$pos;
$pos = -1;
$i=0;
while (($pos=strpos($haystack,$stopneedle,$pos+1))!==false) $stop_array[$i++]=$pos;

$pair= array();
//now match start and stops
$start=0;

foreach ($stop_array as $stop):
	$test = false;
	$prev_key = "";
	$prev_val = "";
	foreach ($start_array as $key=>$start):
		$test = ($start > $stop) ? true : false;
		if ($test != false):
			$pair[] = array ("start"=>$prev_val, "stop" => $stop);
			unset ($start_array[$prev_key]);
			break;
		else:
			
		endif;
		$prev_key = $key;
		$prev_val = $start;				
	endforeach;
	if ($test === false) {$pair[] = array ("start"=>$prev_val, "stop" => $stop);}
endforeach;
foreach ($pair as $p):
$len = $p['stop'] - $p['start'] - strlen($stopneedle) - 1;
$contents[] = substr ($haystack, $p['start'] + strlen($startneedle), $len);
endforeach;
echo "the input string was <br/><i>$haystack</i><br/>";
echo "<br/>nested tag contents are<br/><pre>";
print_r ($contents);
echo "</pre>";
?>

drmindhacker · Jan 7, 2006

I am having trouble writing a parser [new to PHP]. I want to use preg_replace() to take a line of input and return only
alphabetic characters [a-zA-Z].
My current solution:

$word[$x] = preg_replace
(//('/([0-9]+)|!|@|#||&|_|=|,|:|;|<|>|`|~|"|\/|\'|\||\.|\[|\]|\{|\}|\+|$|$|\*|\^|\$|\?|\\\/i', '', $line[$x]);

which will take: t\his, isn't properly for-matted!
and return: this isnt properly formatted

does the job but (1) it doesn't cover other characters that may get passed (nor do I want to keep adding characters to filter out and (2) its very long and ugly - I want a NOT function and tried:

$line[$x] = preg_replace('/(![a-zA-Z]+)/i', '', $line[$x]);

which returns: t\\his, isn\'t properly fltmatted!
[where the lt in fltmatted is <]

[with different variations] but it still returns non-alphabetic
characters! Is this structured incorrectly or is it not possible
to NOT a pattern?

JGH · Jan 7, 2006

Try this. I havent tested it though. The ^ inside the brackets should be the negation you seek...

$line[$x] = preg_replace('/([^a-zA-Z]+)/i', '', $line[$x]);

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Regular expression for preg_match problem

JGH

Programmer

jpadie

Technical User

drmindhacker

Programmer

JGH

Programmer

Similar threads

Part and Inventory Search

Sponsor