Hi,
I'm writing a php app that strips out only text from a .html page so I can insert them into a database and work from there.
To begin, I'm attemping to get all the text in between <td>s but haven't worked out quite well yet. (I'll move onto the <img>s and <a>s inside the <td>s after I successfully retrieve the contents.)
<?
$file = fopen("testpage.html", r);
fpassthru($file);
while (!feof($file))
{
preg_match_all("/^\<td(\>\=\"\.){0,}<\/td>$/", $textString, $matches);
}
?>
My preg knowledge is less than perfect, could you help me? Thanks!
I'm writing a php app that strips out only text from a .html page so I can insert them into a database and work from there.
To begin, I'm attemping to get all the text in between <td>s but haven't worked out quite well yet. (I'll move onto the <img>s and <a>s inside the <td>s after I successfully retrieve the contents.)
<?
$file = fopen("testpage.html", r);
fpassthru($file);
while (!feof($file))
{
preg_match_all("/^\<td(\>\=\"\.){0,}<\/td>$/", $textString, $matches);
}
?>
My preg knowledge is less than perfect, could you help me? Thanks!