Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Save HTML files back

Status
Not open for further replies.

rubis

Programmer
Jun 21, 2001
54
GB
I want to save HTML files of Web pages back to the local machine.

First of all, I wrote this code

use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

$ua = LWP::UserAgent->new;
$res = HTTP::Resquest->new('GET', "$rep = $ua->request($res);

This code gives me "501 (Not Implemented) Protocol scheme 'http' is
not supported". So I search on the Net and got this information.

(lib-------------

If you want to access sites using the https protocol, then you need to
install the Crypt::SSLeay or the IO::Socket::SSL module. The
README.SSL file will tell you more about how lib supports SSL.

I told this to the admin but he said I wasn't trying to access using
https protocol so he told me to do another way as below.

#!/usr/bin/perl

use strict;
use LWP::Simple;

my $doc = get "
print "Content-type:text/html\n\n";
print $doc;

From this, I've got some problems.

1) I don't know whether saving a HTML file back has anything to do
with HTTP protocol or not.

2) What's the difference between HTTPS and HTTP protocol?

3) his code will work with only the homepage located on the Web server
of my work, not outside. It can't download any outside page. So I have
two assumptions
- proxy
- HTTP protocol
If it's because of proxy, how can I set the proxy for this program.
Also, my work uses the autoconfiguration (.pac).

4) if that web page contains Frame, the program won't get the right
web page. The page it gets will say "this page uses frame but your
browser
doesn't support them". The result shown is in the correct format
(frame style) but inside each page, it will say "404 page not found".
So I think the program doesn't load the HTML files inside the frame
but it downloads only the frame code.

My assumption for this problem is as my program isn't a browser so
when it requests the web page from a remote web server, the server can
detect that it's not a browser that supports frame. Therefore, it
gives that page back to my machine instead. However, I don't know
whether it's the right assumption. If so, how to make it get all HTML
files within the frame??

Thanks you all.
 
It sounds like you trying to get the html source code to your computer without opening the pages. Try this.


use LWP::UserAgent;
$full_path='path/to/destination/directory';
$purl = '
($filename)=$purl =~ m!([^/]+)$!;

$save_as="$full_path/$filename";

$ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET =>"$purl");
$res = $ua->request($req, $save_as);

if ($res->is_success) {
$last_action = "BEAM $file Success";
}
else {
$last_action = "BEAM $file Failed";
}
print "$lastaction";



You may also find help in the LWPcookbook on activestate.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top