Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing html/text with PHP

Status
Not open for further replies.

TheVisitor

IS-IT--Management
Oct 11, 2002
6
0
0
US
Hello,

I'm new to PHP, and was wondering what would be the best way to extract text out of a html file. Now I have the file as a string, and let's assume I want to get text between the first occurrence of &quot;<h1>&quot; and the following &quot;</h1>&quot;. Like:

<h1>This is the text I'd like to get</h1>

How do I get the text, and what method would be most CPU-friendly? I guess the quickhack I came up with, using strpos to find beginnings and endings and then substringing isn't the brightest idea here. Is sscanf an option here?
 
preg_match(&quot;/<h1>(.*)<\/h1>/i&quot;,&quot;<html><head>blah<h1>aaaaaaa</h1>...&quot;,$matches);

first item in matches (0) will be whole string
second will be first match
third will be second etc etc --BB
 
Okie, thanks.

By the way, since . doesn't match newlines, and I want to go thru the whole html document, must I just preg_replace newlines away, or what should I do?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top