Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

LWP::UserAgent max_size question

Status
Not open for further replies.

Kirsle

Programmer
Jan 21, 2006
1,179
0
0
US
Hi, I'm making a web app where a user can "upload" an image by providing the URL to an existing image somewhere on the Internet.

To prevent them giving a URL to an extremely large file, I'm trying to limit the max file size that LWP::UserAgent will download.

The documentation for LWP::UserAgent says:

$ua->max_size
$ua->max_size( $bytes )

Get/set the size limit for response content. The default is undef, which means that there is no limit. If the returned response content is only partial, because the size limit was exceeded, then a "Client-Aborted" header will be added to the response. The content might end up longer than max_size as we abort once appending a chunk of data makes the length exceed the limit. The "Content-Length" header, if present, will indicate the length of the full content and will normally not be the same as length($res->content).

My code that handles the upload looks like this:

Code:
    if ($location eq "[URL unfurl="true"]www")[/URL] {
        # It's on the web. Fetch it with lib[URL unfurl="true"]www-perl.[/URL]
        my $ua = LWP::UserAgent->new();

        # Security related options for LWP.
        $ua->timeout(15);                                   # Only try for 15 seconds to get the image.
        $ua->max_size(1024*600);                   # Limit how much data we pull down
        $ua->protocols_allowed([ 'http', 'https', 'ftp' ]); # Only allow web URIs

        # Get a response.
        my $resp = $ua->get($name);
        if ($resp->header("Client-Aborted")) {
            $@ = "The file size on the remote server is too large.";
            return undef;
        }
        elsif ($resp->is_success) {
            $bin = $resp->content;
            if (length $bin == 0) {
                $@ = "Got an empty file from the remote server.";
                return undef;
            }
        }
        else {
            # An error.
            $@ = "Remote server said: " . $resp->status_line();
            return undef;
        }
    }

I found this tutorial here:
It told me I could use if ($resp->header("Client-Aborted")), but this doesn't work. I tested my script by having it fetch a 3 MB file and a 300 KB file. The 3 MB is more than 1024*600 bytes, and the 300 KB is less than. The expected result is that it should give me a "too large" error on the big image but handle the small image just fine.

The "Client-Aborted" header wasn't working. So I tried using Data::Dumper on $resp; I found out the headers returned a status code "206 Partial Content" and one of the headers would say 'content-range' => 'bytes 0-347234/3585528'

At first I assumed the "206 Partial Content" would be reliable to use, and that it's only given by the server when the image size is larger than LWP::UserAgent requested. But, it returns the same status code even for a small image.

The result is, the request goes fine and there's no indication the file size was too big, only, the data returned from the large image is truncated and corrupt (when viewed in an image viewer, only the top portion of the image shows up and the rest is solid gray).

Is there an easy way to check whether the file was too large on the server, apart from having to parse the "content-range" header by hand? The docs on LWP::UserAgent are misleading in any case.

Also note, if I don't include my max_size() call, the return code is always 200 OK and without the content-range header.

Kirsle.net | My personal homepage
Code:
perl -e '$|=$i=1;print" oo\n<|>\n_|_";x:sleep$|;print"\b",$i++%2?"/":"_";goto x;'
 
>Is there an easy way to check whether the file was too large on the server, apart from having to parse the "content-range" header by hand?

It is generally, for any language involved, applying a "HEAD" request to determine the content-length from the returned headers before deciding whether to ($ua->get()) get to the resources or via a HTTP::Request "GET/POST" method to get to the same. If the content-length of the resource is larger than the max_size, then, stop going ahead for it to save bandwidth or whatever.

This is the outline of what would look like more or less.
[tt]
#...etc etc
my $ua = LWP::UserAgent->new();
my $req=HTTP::Request->new('HEAD'=>[blue]$name[/blue]);
my $resp=$ua->request($req);
my $resp_size=0;
my $resp_acc=$resp->is_success;

if ($resp_acc) {
$resp_size=$resp->headers->{'content-length'};
}
[green]#at this stage, $resp_size can compare to $max_size to decide whether to go ahead to retrieve it or not.[/green]
[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top