I want to save HTML files of Web pages back to the local machine.
First of all, I wrote this code
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
$ua = LWP::UserAgent->new;
$res = HTTP::Resquest->new('GET', "$rep = $ua->request($res);
This code gives me "501 (Not Implemented) Protocol scheme 'http' is
not supported". So I search on the Net and got this information.
(lib-------------
If you want to access sites using the https protocol, then you need to
install the Crypt::SSLeay or the IO::Socket::SSL module. The
README.SSL file will tell you more about how lib supports SSL.
I told this to the admin but he said I wasn't trying to access using
https protocol so he told me to do another way as below.
#!/usr/bin/perl
use strict;
use LWP::Simple;
my $doc = get "
print "Content-type:text/html\n\n";
print $doc;
From this, I've got some problems.
1) I don't know whether saving a HTML file back has anything to do
with HTTP protocol or not.
2) What's the difference between HTTPS and HTTP protocol?
3) his code will work with only the homepage located on the Web server
of my work, not outside. It can't download any outside page. So I have
two assumptions
- proxy
- HTTP protocol
If it's because of proxy, how can I set the proxy for this program.
Also, my work uses the autoconfiguration (.pac).
4) if that web page contains Frame, the program won't get the right
web page. The page it gets will say "this page uses frame but your
browser
doesn't support them". The result shown is in the correct format
(frame style) but inside each page, it will say "404 page not found".
So I think the program doesn't load the HTML files inside the frame
but it downloads only the frame code.
My assumption for this problem is as my program isn't a browser so
when it requests the web page from a remote web server, the server can
detect that it's not a browser that supports frame. Therefore, it
gives that page back to my machine instead. However, I don't know
whether it's the right assumption. If so, how to make it get all HTML
files within the frame??
Thanks you all.
First of all, I wrote this code
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
$ua = LWP::UserAgent->new;
$res = HTTP::Resquest->new('GET', "$rep = $ua->request($res);
This code gives me "501 (Not Implemented) Protocol scheme 'http' is
not supported". So I search on the Net and got this information.
(lib-------------
If you want to access sites using the https protocol, then you need to
install the Crypt::SSLeay or the IO::Socket::SSL module. The
README.SSL file will tell you more about how lib supports SSL.
I told this to the admin but he said I wasn't trying to access using
https protocol so he told me to do another way as below.
#!/usr/bin/perl
use strict;
use LWP::Simple;
my $doc = get "
print "Content-type:text/html\n\n";
print $doc;
From this, I've got some problems.
1) I don't know whether saving a HTML file back has anything to do
with HTTP protocol or not.
2) What's the difference between HTTPS and HTTP protocol?
3) his code will work with only the homepage located on the Web server
of my work, not outside. It can't download any outside page. So I have
two assumptions
- proxy
- HTTP protocol
If it's because of proxy, how can I set the proxy for this program.
Also, my work uses the autoconfiguration (.pac).
4) if that web page contains Frame, the program won't get the right
web page. The page it gets will say "this page uses frame but your
browser
doesn't support them". The result shown is in the correct format
(frame style) but inside each page, it will say "404 page not found".
So I think the program doesn't load the HTML files inside the frame
but it downloads only the frame code.
My assumption for this problem is as my program isn't a browser so
when it requests the web page from a remote web server, the server can
detect that it's not a browser that supports frame. Therefore, it
gives that page back to my machine instead. However, I don't know
whether it's the right assumption. If so, how to make it get all HTML
files within the frame??
Thanks you all.