Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

preg_split expresion

Status
Not open for further replies.

Horrid

Programmer
May 20, 1999
373
I have never managed to work out how to structure regular expressions and could use some help explaining how and why you would construct a regular expresion to extract the content between 2 html tags.

The tags are <td id=(a numerical value)>(text I want)</td>

I tried this
$r=preg_split(&quot;<td id\=(.*)/td>&quot;, $res);
while (list ($key, $val) = each ($r))
{
echo &quot;$key => $val
&quot;;
}
and didn't get an error for a change but the output I got was 0=>> 1=>>. All my efforts have failed.

Thanks for any help.
 
I would use an expression something like:

&quot;/<td id=\d+>([^<]*)</td>/&quot;

The expression is in three parts:
<td id=\d+> finds and &quot;eats&quot; the td tag and id attribute with numerical values.

([^<]*) finds and keeps everything after the closing of the previous tag, up to the next open tag.

</td> finds and discards the closing td tag.

Want the best answers? Ask the best questions: TANSTAAFL!!
 
Thank you! Exactly the eplination I needed.

 
Just keep in mind that since the part of the expression that is looking for &quot;keepable&quot; text is specifically looking for anything up to a &quot;<&quot;, this expression will barf on tags inside of tags: &quot;<td id=1><b>foo</b></td>&quot;

Want the best answers? Ask the best questions: TANSTAAFL!!
 
Good point, I'm sure I can find a solution to that one.

I made an error in my first question, the tag structure looks more like this

$res = '<td id=&quot;taw0&quot;>blah blah</td><td id=&quot;taw0&quot;>hlah hlah&quot;</td>';

So I changed the expresion to this.
$r=preg_split('/<td id=\&quot;[t*]aw\d+\&quot;>([^<]+)\/td>/', $res);

It retured the result with the content I wanted but not as 2 array items, they come out as one long string.

Here is my thinking
match and drop the open tag
<td id=&quot; with 0 or 1 matching t + aw + any number &quot;>

Grab anything up to a close tag

I had to change the tag you had to get it to work
\/td>/. Drop the close </td> tag. With the starting < I got 3 blank results so I deleted it to see what happens, it seemed to help.

Any ideas on why it would return everything as one array item, not as separate matches?
 
Oops. Sorry. I didn't pay enough attention and didn't notice you were using preg_split.

Think of the string '<td id=&quot;taw0&quot;>blah blah</td><td id=&quot;taw0&quot;>hlah hlah</td>' as a set of records. If you want to get the records out, what separates the records? The substring '</td><td id=&quot;taw0&quot;>'

Here's another tack to take. First, simplify the string by removing what you don't need. Then split on what's left.

I chose to remove the opening tags first, then to split on the closing tags. This does, however, leave an extra array element in the final array, as the &quot;record terminator&quot; appears at the end of the string.

Code:
<?php
$res = '<td id=&quot;taw0&quot;>blah blah</td><td id=&quot;taw0&quot;>hlah hlah</td>';

$res = preg_replace ('/<td[^>]*>/', '', $res);
$res = preg_split ('/<\/td>/', $res);
?>

Want the best answers? Ask the best questions: TANSTAAFL!!
 
Thanks for the help.

I gave up in the end and just went back to manual string manipulation, had it going in 30 seconds.

I'll run through your examples as a learning exercise.

Once again, thanks for your help, was a good way to learn.
 
Yeah, the thing with regular expressions is that you really can't be taught them. You just have to osmotically absorb information about them until one day you grok them and wonder what all the fuss was about.

Apache's mod_rewrite module uses a regular expression engine to rewrite URLs on the fly. It's a very handy Apache module, but the following is found in the documentation (at
&quot;Despite the tons of examples and docs, mod_rewrite is voodoo. Damned cool voodoo, but still voodoo.&quot;


This, I think, is apropos to regular expressions in general.

Want the best answers? Ask the best questions: TANSTAAFL!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top