Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Do PHP file uploads perform any kind of integrity checking?

Status
Not open for further replies.

cmayo

MIS
Apr 23, 2001
159
US
I've recently found some corrupted files in an upload directory, leading me to wonder whether the upload process does any checksum comparisons while it's handling uploaded files.

For instance, when a file is uploaded from a user PC to the server's temp directory, is any checking (checksum/MD5/SHA) done to verify that the uploaded file matches the original file on the user's PC?

And when an uploaded file is moved from temp to the destination location using move_uploaded_file(), does the server compare the original uploaded file to the final destination file? I guess in this case (a file move), only directory references change and the file isn't actually moved from place to place, but I really don't know.

Thoughts?
 
The basic answer is no, with the proviso of ... Not unless you [or the author of the code] wrote something into the script to do so.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.
Webmaster Forum
 
>> Not unless you [or the author of the code] wrote something into the script to do so.

Simple enough for in-system operations once the file's been uploaded, but I guess I'm more concerned with verifying that user files aren't corrupted during the upload. I won't have any way of generating a checksum on a user file before he uploads it, so I don't see any way of verifying that I received what the user sent. I was hoping the upload mechanism itself incorporated some sort of verification.
 
Only those naturally inherent in the TCP/IP protocol

if you are writing a corrupt file then either:

1 the remote end corrupted it before sending
2) you have a hardware problem on your server writing incorrect data


Do things on the cheap & it will cost you dear
 
In what way are the uploaded files corrupted?
Could it be something simple like text files being uploaded as binaries and having all the line endings stripped off?

Keith
 
These were PDFs which, a couple years after being uploaded via an HTML/PHP upload form, couldn't be opened in Acrobat. We're also missing a few files (90 out of ~57,000).

The number of problem files is very, very low (0.015%) but still, it would be good to know if the missing/corrupted files originally made it onto our filesystem intact and later disappeared/became corrupt, or if the problems occurred during the upload process.

I'm in the process of coding some extra checks throughout the uplooad process, i.e. instead of just trusting move_uploaded_file to throw an error, I'm doing before & after checksums to verify that the uploaded file actually did make it to the desination directory and wasn't changed during the process.
 
Your question finally is about how input type=file is handled on the protocol level. RFC 1867 has defined that: it is obsoleted with RFC 2854, which mainly just aggregats some RFC, though.

The original purposed way is still valid: The browser encodes form data and the file in MIME type multipart/form-data, which basically means the file is one part of a multipart http upload, the file itself is encoded base64, about the same way as mail attachments. The only way to verify the completeness of an upload is to check the content-length given for the whole html form data, but the part of the upload. There is no such thing done as commputing and checking a file checksum.

What the web server does is not defined here, but it can't communicate back with the client to let it compute and send a checksum, it is confronted with the form-data in the same manner as amail client is confronted with a mail. I haven't investigated what happens as aftermath, but I think there is no rule for that, it's a non standardized part of the web server handling the bass64 decoding of the uploaded file part and its move in the file system. As a temp dir typically is on the root drive and the hosted domains root on some other drive the file surely isn't moved by just changing the table of content of the file system, the file is truly copied most of the time, but that has much fewer instabilities as the http communication between client and server. In regard of the checks the tcp packets have CRC chceksums each, without going htat deep the mechanisms to check the integrity of the data transfer I think it's fairly safe to assume the integrity mainly relies on the tcp checksum mechanisms per packet and main problems are timeouts, incomplete uploads, not transfer errors. For sure there is no checksum in the file part of the miltipart/form-data body and for sure the most common error is incomlplete http form data upload, not corruption of the data.

Bye, Olaf.
 
Great description of the process by Olaf.

A solution to verifying the integrity of a submitted file may be to use a modified JavaScript/HTML5 technique for previewing images before sending them to the server.

If you skip giving the file field a name, that field in itself won't get submitted - but you can use JavaScript and a FileReader to read the Base64 of the file to a hidden field that would get submitted (I also used jQuery, but something like that could be done similar to this):

Code:
$(document).ready(function(){
    var fileField = $('#hidden-upload'),
        imgField = $('#image_data');

    fileField.on('change', function(){
        processFiles($(this)[0].files);
        return false;
    });

    var processFiles = function(files){
        if(files.length != 1){
            alert('Unable to read from '+files.length+' files at a time.');
            return;
        }
        readFile(files[0]);
    };

    var readFile = function(file){
        var reader = new FileReader();
        reader.onload = function(e){ imgField.val(e.target.result); };
        reader.readAsDataURL(file);
    };
});

The read file method could be modified to calculate a checksum/MD5/SHA on the base64 data, and populate another hidden field.

Then on the server (in PHP) you could do that same calculation on the data, compare the two checksums, and write the base64 data to a file.

Code:
$img = file_get_contents($values['image_data']);

This works because the base64 submitted contains the "file:" prefix and PHP usually (unless configured otherwise) will recognize it as a valid file stream.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top