Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing a web page CGI-Perl

Status
Not open for further replies.

thebarslider

Programmer
Dec 21, 2001
80
GB

I am struggling to work out how one would parse a webpage that was automatically generated by a Perl program. The page changes everyday and its URL changes. I dont really know where to start and would appreciate any help.
 
To retrieve the web page use the LWP module. See 'perldoc LWP' for more.
Code:
#!/usr/local/bin/perl
use LWP::Simple;
$url = '[URL unfurl="true"]http://yourserver.com/page.html';[/URL]
$html = get($url);

To parse and play with the HTML content, there are two general paths.

1 - if you want parse lots of tags from the HTML, use one of the several modules for playing with HTML from CPAN (eg. There are several to choose from. You'll need to match the appropriate choice to the specific task you're working on.

2 - of, if you have one or two or three specific tags from which you want the content, a little pattern matching will work.

Code:
#!/usr/local/bin/perl<html>
my $page = qq(<head>
    <title>A Perl Regex Example</title>
  </head>
<body>
  <p>An html page</p>
</body>
</html>);

if ($page =~ /<title>(.*?)<\/title>/is)
    { $title_content = $1; }
print &quot;TITLE: $title_content\n&quot;;
'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top