Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Check file is valid and not corrupt? 3

Status
Not open for further replies.

Destruktion

Programmer
Mar 18, 2011
28
GB
Hey, I have a program Im working on that allows user to save contents (treeview streams) to one file to disk, then they can reload the program and open the saved file (just like excel, word, delphi project files etc).

That all works great, but I need to detect if the file has been tampered (modified) with. For example if I open a file I created from my app in Notepad - and see the usual stream characters, if I delete some text or rename portions (which will break the stream), when I load the file it will not error, but nothing will be displayed (treeview stream). The treeview will just be empty because it could not read the stream file.

Even with Try..Except on E: Exception statement in place it will not detect the errors.

So basically, is there a way (hash check, CRC? etc) that will know that the file created from my program is not valid or corrupt? I dont want my program to be able to read files it has created that may not be valid or corrupt.

Imagine renaming a word document file (mynotes.doc) to (mynotes.xsl) and opening mynotes.xsl in word, it would error and know it is not correct file.

How can I implement this in my programs so that I can know the files are valid?

Many Thanks!
 
Your Word example holds because the program looks for specific features in what qualifies as a "Word doc" file and then fails if it doesn't find them (and it won't for a XSL file).

You make a rule for a specific format. Part of that might be a hash check. See faq102-7423 for how to do one.

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
Thanks for the reply Glenn,

one question though - when i create (save) the files from my app do i have to generate a file hash or does Windows do this with every file? Im kind of new to this, is there a simple example that shows saving/loading a file from a program and checking the integrity of the file for errors, corruptions etc.

I dont know what to do with the link you posted.

Thanks!
 
You'll have to calculate the checksum value and then put that into your saved stream. One option might be:
Saving the file:
Pull the data stream into a memory stream
Calculate a hash code on the data stream
Put the hash code into the start of a new file stream and then append your data stream on to that.
Save the file stream to file.

When you load:
Load in the file into a memory stream.
Pull the hash code from the beginning of the stream.
Test the hash code of the remainder of the stream.
If the stored hash and the calculated hash don't mix then you can alert the user to the problem.

Make the hash code a LongWord so it won't change if you start using a 64bit Delphi compiler.

This code shows you how to calculate a CRC32 checksum on a file - you could probably adapt it to a memory stream:

 
one question though - when i create (save) the files from my app do i have to generate a file hash or does Windows do this with every file? Im kind of new to this, is there a simple example that shows saving/loading a file from a program and checking the integrity of the file for errors, corruptions etc.

DjangMan covers the process well enough. You hash your relevant data out of the file, save a hash of that data into the file and then when you read the relevant data, you check it against the saved hash and if there isn't a discrepancy then the data isn't considered altered. You can come up with the hash a number of ways as has been illustrated. You should see a good way to do a MD5 or SHA1 from the link I posted and a CRC32 from what DjangMan posted. Most of this is really a file format design question.

If it helps, I'll provide an example: One of the things I've been working with is Microsoft CAB files and making a number of Delphi units to work with those. One of those involves the structures for reading the file in a raw way outside of the API. This means that I can come up with my own complete implementation if I have a working MSZIP and LZX compress and decompress (I don't). But those are within the API, and I'm working on components to exploit the API (The API is pretty obfuscated and that makes it complex).

Now if you look at the file to test it, there is a file header record. Part of that has to have the string "MSCF". If that doesn't exist there's a problem. Also included is the size of the cabinet. If that doesn't match, there's a potential problem. There are also a number of other values to use for navigating the file.

There's also a number of folder headers, which describe the compression format used, and a number of other things. There are file headers, which describe the file names, uncompressed size, and folder the file belongs to.

But when I get to the data, there is a check that most CAB "test" programs will apply to the data. In CAB files, data are saved in contiguous 32K blocks, along with a compressed data size and uncompressed data size. Saved as part of the data header block is a LongWord XOR hash, which consists of this hash against the saved data and the compressed/uncompressed data values. To test, one reads this data and compares it against the saved hash. These "CAB test" programs do this and will return an error.

Now if I create a CAB file, I have to calculate this hash and save it to the data header block when I compress the data referenced by the header. This is just an outgrowth of how the CAB file format was designed.

All you need to do, really, is come up with a good design with the information you need to check it against, save the information there when you save it, and check that when you load it. How you do that is up to you (as long as it works).

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
Brilliant stuff thanks Glen and DjangMan,

I thought there would be an easier solution but I was wrong. I will take some time to read the links posted and also your messages in more detail.

From what I understand so far, I could do something like this:

<> Move my stream into a temp TMemoryStream.
<> Calculate hash on the TMemoryStream.
<> Copy this hash into the stream which will be saved to disk.

<> Load the file into TMemoryStream
<> check for matching hash
<> ..Also check the file for matching patterns.

Will need longer to read on this though, I know files like WAV, MP3, GIF etc (and all files i guess) have there own headers.

Many Thanks for the info guys :)
 
Since I'm able to post this now, one of the things I've been playing with is these file hashes and have my own implementations of all the ones mentioned. That said, I don't like the one on the scalabium link for two reasons. The biggest one being the calculation of the table on every run, but there's also an additional instruction that has no effect.

Here's the Pascal/Delphi version I have. Unit...
Code:
Unit d3_crc32;

{ original credits below.  CRC32Update added by myself

Modified from a version posted on the Pascal echo by Floor
  A.C. Naaijkens (note introduction of a Constant and a
  Function and making the Array a Constant outside of
  crc32() which should improve speed a lot; further
  references made to a C version in a File of Snippets from the
(112 min left), (H)elp, More?   C Echo marked as "Copyright (C) 1986 Gary S. Brown" (mostly to
  compare the large Arrays, which proved to be identical), and to
  "File verification using CRC" by Mark R. Nelson in Dr. Dobbs'
  Journal, May 1992.  The latter provided the final piece of
  crucial information.  Use: 1) Create a LongInt Variable For the
  CRC value; 2) Initialize this With CRCSeed; 3) Build the
  CRC Byte-by-Byte With CRC32(); 4) Finish With CRCend()
  (this was the part I found in Nelson).  The result is a CRC
  value identical to that calculated by PKZip and ARJ.
}

Interface
  uses windows;
  Const
    CRCSeed = $ffffffff;
    CRC32tab : Array[0..255] of DWord = (
      $00000000, $77073096, $ee0e612c, $990951ba, $076dc419, $706af48f,
      $e963a535, $9e6495a3, $0edb8832, $79dcb8a4, $e0d5e91e, $97d2d988,
      $09b64c2b, $7eb17cbd, $e7b82d07, $90bf1d91, $1db71064, $6ab020f2,
      $f3b97148, $84be41de, $1adad47d, $6ddde4eb, $f4d4b551, $83d385c7,
      $136c9856, $646ba8c0, $fd62f97a, $8a65c9ec, $14015c4f, $63066cd9,
      $fa0f3d63, $8d080df5, $3b6e20c8, $4c69105e, $d56041e4, $a2677172,
      $3c03e4d1, $4b04d447, $d20d85fd, $a50ab56b, $35b5a8fa, $42b2986c,
      $dbbbc9d6, $acbcf940, $32d86ce3, $45df5c75, $dcd60dcf, $abd13d59,
      $26d930ac, $51de003a, $c8d75180, $bfd06116, $21b4f4b5, $56b3c423,
      $cfba9599, $b8bda50f, $2802b89e, $5f058808, $c60cd9b2, $b10be924,
      $2f6f7c87, $58684c11, $c1611dab, $b6662d3d, $76dc4190, $01db7106,
      $98d220bc, $efd5102a, $71b18589, $06b6b51f, $9fbfe4a5, $e8b8d433,
      $7807c9a2, $0f00f934, $9609a88e, $e10e9818, $7f6a0dbb, $086d3d2d,
      $91646c97, $e6635c01, $6b6b51f4, $1c6c6162, $856530d8, $f262004e,
      $6c0695ed, $1b01a57b, $8208f4c1, $f50fc457, $65b0d9c6, $12b7e950,
      $8bbeb8ea, $fcb9887c, $62dd1ddf, $15da2d49, $8cd37cf3, $fbd44c65,
      $4db26158, $3ab551ce, $a3bc0074, $d4bb30e2, $4adfa541, $3dd895d7,
      $a4d1c46d, $d3d6f4fb, $4369e96a, $346ed9fc, $ad678846, $da60b8d0,
      $44042d73, $33031de5, $aa0a4c5f, $dd0d7cc9, $5005713c, $270241aa,
      $be0b1010, $c90c2086, $5768b525, $206f85b3, $b966d409, $ce61e49f,
      $5edef90e, $29d9c998, $b0d09822, $c7d7a8b4, $59b33d17, $2eb40d81,
      $b7bd5c3b, $c0ba6cad, $edb88320, $9abfb3b6, $03b6e20c, $74b1d29a,
      $ead54739, $9dd277af, $04db2615, $73dc1683, $e3630b12, $94643b84,
      $0d6d6a3e, $7a6a5aa8, $e40ecf0b, $9309ff9d, $0a00ae27, $7d079eb1,
      $f00f9344, $8708a3d2, $1e01f268, $6906c2fe, $f762575d, $806567cb,
      $196c3671, $6e6b06e7, $fed41b76, $89d32be0, $10da7a5a, $67dd4acc,
      $f9b9df6f, $8ebeeff9, $17b7be43, $60b08ed5, $d6d6a3e8, $a1d1937e,
      $38d8c2c4, $4fdff252, $d1bb67f1, $a6bc5767, $3fb506dd, $48b2364b,
      $d80d2bda, $af0a1b4c, $36034af6, $41047a60, $df60efc3, $a867df55,
      $316e8eef, $4669be79, $cb61b38c, $bc66831a, $256fd2a0, $5268e236,
      $cc0c7795, $bb0b4703, $220216b9, $5505262f, $c5ba3bbe, $b2bd0b28,
      $2bb45a92, $5cb36a04, $c2d7ffa7, $b5d0cf31, $2cd99e8b, $5bdeae1d,
      $9b64c2b0, $ec63f226, $756aa39c, $026d930a, $9c0906a9, $eb0e363f,
      $72076785, $05005713, $95bf4a82, $e2b87a14, $7bb12bae, $0cb61b38,
      $92d28e9b, $e5d5be0d, $7cdcefb7, $0bdbdf21, $86d3d2d4, $f1d4e242,
      $68ddb3f8, $1fda836e, $81be16cd, $f6b9265b, $6fb077e1, $18b74777,
      $88085ae6, $ff0f6a70, $66063bca, $11010b5c, $8f659eff, $f862ae69,
      $616bffd3, $166ccf45, $a00ae278, $d70dd2ee, $4e048354, $3903b3c2,
      $a7672661, $d06016f7, $4969474d, $3e6e77db, $aed16a4a, $d9d65adc,
      $40df0b66, $37d83bf0, $a9bcae53, $debb9ec5, $47b2cf7f, $30b5ffe9,
      $bdbdf21c, $cabac28a, $53b39330, $24b4a3a6, $bad03605, $cdd70693,
      $54de5729, $23d967bf, $b3667a2e, $c4614ab8, $5d681b02, $2a6f2b94,
      $b40bbe37, $c30c8ea1, $5a05df1b, $2d02ef8d  );

function crc32_update(inbuffer: pointer; buffersize, crc: DWord): DWord;
Function CRCend( crc : DWord ): DWord;

Implementation

function crc32_update(inbuffer: pointer; buffersize, crc: DWord): DWord;
type
  PByte = ^Byte;
var
  currptr: PByte;
  i: integer;
begin
  currptr := inbuffer;
  Result := crc;
  for i := 1 to buffersize do
    begin
      Result := CRC32tab[Byte(Result xor DWord(currptr^))] xor (Result shr 8);
      inc(Longint(currptr), 1);
    end;
end;

Function CRCend( crc : DWord ): DWord;
begin
  CRCend := (crc xor CRCSeed);
end;

end.

Example to use:
Code:
{$APPTYPE CONSOLE}
program crcmain; uses d3_crc32, sysutils, windows;
  { take CRC-32 of file in input }
  var
    infile: file;
    crcvalue: DWord;
    inbuffer: array[1..32768] of byte;
    amount_read: integer;
  begin
    AssignFile(infile, 'crccopy.txt');
    Reset(infile, 1);
    crcvalue := crcseed;
    blockread(infile, inbuffer, Sizeof(inbuffer), amount_read);
    repeat
      crcvalue := crc32_update(@inbuffer, amount_Read, crcvalue);
      blockread(infile, inbuffer, Sizeof(inbuffer), amount_read);
    until amount_read = 0;
    CloseFile(infile);
    crcvalue := crcend(crcvalue);
    writeln('CRC-32 Hex: ', IntToHex(crcvalue, 16));
    readln;
  end.

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
Thanks for this, I will have a play with it see how it works and adapt to my program.

:)
 
To add to my previous post, I have made the code you posted Glen into a new app console project, moved the file created in my program and run the console - and voila it created the CRC hash number :)

I ran it again with no changes to my file and the hash is the same, I tried again modifying my file and the hash changes.

So now it is a simple case of me copying this hash number into my stream and saving it, then whenever i read the file check for hash match.

Thanks Glen and DjangMan!

This Delphi forum is by far the best :)
 
I played around with the code and got a 4-byte version that seems to be a little faster.

Posted as FAQ: faq102-7462

Your mileage may vary and if it doesn't work, I can always post the one in this thread to the FAQ.

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top