Parsing a web page CGI-Perl

thebarslider · Feb 9, 2003

I am struggling to work out how one would parse a webpage that was automatically generated by a Perl program. The page changes everyday and its URL changes. I dont really know where to start and would appreciate any help.

goBoating · Feb 10, 2003

To retrieve the web page use the LWP module. See 'perldoc LWP' for more.

Code:

#!/usr/local/bin/perl
use LWP::Simple;
$url = '[URL unfurl="true"]http://yourserver.com/page.html';[/URL]
$html = get($url);

To parse and play with the HTML content, there are two general paths.

1 - if you want parse lots of tags from the HTML, use one of the several modules for playing with HTML from CPAN (eg.

http://www.cpan.org/modules/by-category/15_World_Wide_Web_HTML_HTTP_CGI/HTML/HTML-Parser-3.27)

There are several to choose from. You'll need to match the appropriate choice to the specific task you're working on.

2 - of, if you have one or two or three specific tags from which you want the content, a little pattern matching will work.

Code:

#!/usr/local/bin/perl<html>
my $page = qq(<head>
    <title>A Perl Regex Example</title>
  </head>
<body>
  <p>An html page</p>
</body>
</html>);

if ($page =~ /<title>(.*?)<\/title>/is)
    { $title_content = $1; }
print &quot;TITLE: $title_content\n&quot;;

'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Parsing a web page CGI-Perl

thebarslider

Programmer

goBoating

Programmer

Similar threads

Part and Inventory Search

Sponsor