Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extract the header of a Web site 1

Status
Not open for further replies.

rubis

Programmer
Jun 21, 2001
54
GB
Hi,

I would like to extract the information of the header from a Web page. I've found an example as the following.


use LWP::Simple qw($ua get);
use HTTP::Headers;
use HTML::HeadParser;

$ua->proxy(http => "$h = HTTP::Headers->new;
$p = HTML::HeadParser->new($h);
$url = '
$content = get($url);

$p->parse($content);

print $h->header('Title')."\n";
print $h->header('Content-Base')."\n";
print $h->header('Last-Modified')."\n";
print $h->header('Content-Length')."\n";
print $h->header('Meta')."\n";


Everything seems fine EXCEPT no info about &quot;last-modified&quot;, &quot;content-length&quot; and &quot;meta&quot;. I checked on Yahoo Page. In the <head>..</head>, they don't put &quot;last-modified&quot; and &quot;content-length&quot;. Does it mean that they have to put those info on the Web so I can get it? Also, there is a meta tag on the page and I'm wondering how can I print out the meta tag.

Thanks,
 
$h->header('Foo') will access <meta http-equiv=&quot;Foo&quot; content=&quot;...&quot;>

and no,
you cannot get at data unless it is actually there.

hth &quot;If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito.&quot;
 
to use 'Foo', the meta tag has to be <meta http-equiv=&quot;Foo&quot; content=&quot;...&quot;> ???

if the meta tag is <meta name=&quot;description&quot; content=&quot;...&quot;>, is the syntax should be &quot;$h->header('Description')&quot; ?? I've tried it but it doesn't work. Any suggestion?

Thanks,
 
Here is another way you can try. I am just giving you an example here without proxy address part. As you can see I chose to print out all data and then extract specifically &quot;last modification data&quot;. For content-type you use the &quot;0&quot; index and for content-length you use &quot;1&quot; index.

[tt]
use strict;

use LWP::Simple;
my $data = head('print $data->{'_headers'}->{'last_modified'};

use Data::Dumper;
print &quot;\n\n&quot;, Dumper $data;

my $moddate = (head('[2];

my ($timer) = scalar localtime($moddate);
print &quot;\n\nLast modified date: $timer\n&quot;;[/tt] =================
Bad Company Music
=================
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top