Parsing html/text with PHP

TheVisitor · May 15, 2002

Hello,

I'm new to PHP, and was wondering what would be the best way to extract text out of a html file. Now I have the file as a string, and let's assume I want to get text between the first occurrence of "<h1>" and the following "</h1>". Like:

<h1>This is the text I'd like to get</h1>

How do I get the text, and what method would be most CPU-friendly? I guess the quickhack I came up with, using strpos to find beginnings and endings and then substringing isn't the brightest idea here. Is sscanf an option here?

BB101 · May 16, 2002

preg_match("/<h1>(.*)<\/h1>/i","<html><head>blah<h1>aaaaaaa</h1>...",$matches);

first item in matches (0) will be whole string
second will be first match
third will be second etc etc --BB

TheVisitor · May 16, 2002

Okie, thanks.

By the way, since . doesn't match newlines, and I want to go thru the whole html document, must I just preg_replace newlines away, or what should I do?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Parsing html/text with PHP

TheVisitor

IS-IT--Management

BB101

Programmer

TheVisitor

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor