Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need time-out procedure for HTML perl procedure 1

Status
Not open for further replies.

bulgin

IS-IT--Management
Mar 17, 2005
13
US
I have the following perl script which works nicely - it imports a list of urls, goes out to them, grabs the Head data and writes it to a file. Problem is, if it encounters a domain that is unresponsive or takes long to load, it halts and just sits there. I'm wondering if there is a way to make the script go to the next line if it's having difficulty connecting to the domain.

I'm also wondering if it is possible to create "threads" like the big-shot developers do? And in the extreme, this script may be looking at a list of thousands of URLs, so does anyone see any problem with this script handling that much overhead and not crashing out?

Here is the script and thank you for any help you may suggest.


#!/usr/bin/perl
#print "Content-type: text/html\n\n";
use LWP::Simple;
use HTML::HeadParser;
open (OUTFILE, '>outfile.txt');
open (MYFILE, 'url3.txt');
foreach $line (<MYFILE>) {
chomp($line);
$URL = get($line);
$Head = HTML::HeadParser->new;
$Head->parse("$URL");

print OUTFILE $Head->header('X-Meta-Description') . ".";
}
close(MYFILE);
close(OUTFILE);
exit;
 
Thanks for your help. I think I will try again in the morning.

Cheers!
 
Can't figure it out. I thought maybe it had something to do with the sequence of the script, changed that a bit, but didn't help. The code:
Code:
$head->parse("$response");
foreach (keys %{$head->header()}) {
    print "$_\n";    
}
produces no results.

Have you tried testing this on your end with a known-good list of URLs?

Thanks.
 
I tested a few random URL's such as google, tek-tips etc. I installed HTML::HeadParser, and I recieve no output for either of our scripts whilst trying to get 'X-Meta-Description' even if there is a meta description. What URL's are you testing with? When I tried to get other header info i.e. 'Title' I had no problems.

Chris

 
I couldn't get any meta data to print.

This script works and and it seems to honor the timeout:

Code:
#!/usr/bin/perl
#use strict;
use LWP::Simple qw($ua get);
use HTML::HeadParser;
# use HTTP::Status;
$ua->timeout(10);
open (OUTFILE, '>outfile.txt');
open (MYFILE, 'url4.txt');
foreach $line (<MYFILE>) {
chomp($line);
$URL = get($line);
$Head = HTML::HeadParser->new;
$Head->parse("$URL");

print OUTFILE $line. "\t" . $Head->header('Title') . "\t". $Head->header('X-Meta-Keywords') ."\t". $Head->header('X-Meta-Des
cription') ."\n\n";

}
close(MYFILE);
close(OUTFILE);
exit;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top