Need time-out procedure for HTML perl procedure 1

bulgin · Aug 16, 2010

I have the following perl script which works nicely - it imports a list of urls, goes out to them, grabs the Head data and writes it to a file. Problem is, if it encounters a domain that is unresponsive or takes long to load, it halts and just sits there. I'm wondering if there is a way to make the script go to the next line if it's having difficulty connecting to the domain.

I'm also wondering if it is possible to create "threads" like the big-shot developers do? And in the extreme, this script may be looking at a list of thousands of URLs, so does anyone see any problem with this script handling that much overhead and not crashing out?

Here is the script and thank you for any help you may suggest.

#!/usr/bin/perl
#print "Content-type: text/html\n\n";
use LWP::Simple;
use HTML::HeadParser;
open (OUTFILE, '>outfile.txt');
open (MYFILE, 'url3.txt');
foreach $line (<MYFILE>) {
chomp($line);
$URL = get($line);
$Head = HTML::HeadParser->new;
$Head->parse("$URL");

print OUTFILE $Head->header('X-Meta-Description') . ".";
}
close(MYFILE);
close(OUTFILE);
exit;

bulgin · Aug 16, 2010

Thanks for your help. I think I will try again in the morning.

Cheers!

bulgin · Aug 17, 2010

Can't figure it out. I thought maybe it had something to do with the sequence of the script, changed that a bit, but didn't help. The code:

Code:

$head->parse("$response");
foreach (keys %{$head->header()}) {
    print "$_\n";    
}

produces no results.

Have you tried testing this on your end with a known-good list of URLs?

Thanks.

Zhris · Aug 17, 2010

I tested a few random URL's such as google, tek-tips etc. I installed HTML::HeadParser, and I recieve no output for either of our scripts whilst trying to get 'X-Meta-Description' even if there is a meta description. What URL's are you testing with? When I tried to get other header info i.e. 'Title' I had no problems.

Chris

bulgin · Aug 17, 2010

I couldn't get any meta data to print.

This script works and and it seems to honor the timeout:

Code:

#!/usr/bin/perl
#use strict;
use LWP::Simple qw($ua get);
use HTML::HeadParser;
# use HTTP::Status;
$ua->timeout(10);
open (OUTFILE, '>outfile.txt');
open (MYFILE, 'url4.txt');
foreach $line (<MYFILE>) {
chomp($line);
$URL = get($line);
$Head = HTML::HeadParser->new;
$Head->parse("$URL");

print OUTFILE $line. "\t" . $Head->header('Title') . "\t". $Head->header('X-Meta-Keywords') ."\t". $Head->header('X-Meta-Des
cription') ."\n\n";

}
close(MYFILE);
close(OUTFILE);
exit;

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Need time-out procedure for HTML perl procedure 1

bulgin

IS-IT--Management

bulgin

IS-IT--Management

bulgin

IS-IT--Management

Zhris

Programmer

bulgin

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor