Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compare 2 Files Hex 1

Status
Not open for further replies.

patriciaxxx

Programmer
Jan 30, 2012
277
GB
I want to store hex strings in an Array and then compare a file to see if there is any match.

This is what I have so far.

My Array

Code:
Dim DataHold(50, 2) As Variant
DataHold(1, 1) = "This would be first hex string"
DataHold(1, 2) = "This would be first hex string description to display if match is true"
DataHold(2, 1) = "This would be second hex string"
DataHold(2, 2) = "This would be second hex string description to display if match is true"
DataHold(3, 1) = "This would be third hex string"
DataHold(3, 2) = "This would be third hex string description to display if match is true"
'and so on...

My code to load a file Hex

Code:
Function GetFileHex()
Dim intFileNumber As Integer
Dim lngFileSize As Long
Dim strBuffer As String
Dim lngCharNumber As Long
Dim strCharacter As String * 1
Dim strFileName As String
  
strFileName = CurrentProject.path & "\Test.mdb"
  
intFileNumber = FreeFile
  
DoCmd.Hourglass True
  
Open strFileName For Binary Access Read Shared As #intFileNumber
  
lngFileSize = LOF(intFileNumber)
strBuffer = Space$(lngFileSize)
  
Get #intFileNumber, , strBuffer
Close #intFileNumber
  
For lngCharNumber = 1 To lngFileSize
  strCharacter = Mid(strBuffer, lngCharNumber, 1)
  Dim strHex As String
  strHex = strHex & Hex(asc(strCharacter))
  
Next
  Debug.Print strHex
  
DoCmd.Hourglass False

End Function

The code returns the following for the sample below

01005374616E64617264204A657420444201000B56E362609C255E9A96772403F09C7E9F90FF859A31C579BAED30BCDFCC9D63D9E4C39F46FB8ABC4E8874EC3753CB9CFAC8D128E61D398A605A1B7B36FBFDDFB1797B1343C120B1333AEE795B9C3A7C2A6AFA7C9981F98FD80BFE050BBDF81665F95F8D089248567C61F2744D2EECF65EDFF7C746A17816CEDE92D62D454600342E300 and so on…ie the rest of the file hex

1. I need the code to return the hex string with all the zeros.
2. I need the code to have a byte setting I can set to the number of bytes I want to retun in this case 160

A sample hex string for the Array

This sample is 160 bytes. I will keep all the Array hex strings at 160 bytes to make the comparisons easier.

3. I need a simple way to strip out the spaces.

00 01 00 00 53 74 61 6E 64 61 72 64 20 4A 65 74 20 44 42 00 01 00 00 00 B5 6E 03 62 60 09 C2 55 E9 A9 67 72 40 3F 00 9C 7E 9F 90 FF 85 9A 31 C5 79 BA ED 30 BC DF CC 9D 63 D9 E4 C3 9F 46 FB 8A BC 4E F5 59 EC 37 2E E6 9C FA B5 FC 28 E6 60 14 8A 60 27 36 7B 36 86 D0 DF B1 04 56 13 43 BC 0D B1 33 47 C3 79 5B E1 17 7C 2A 22 D4 7C 99 08 1F 98 FD 81 7E 91 58 19 70 84 66 5F 95 F8 D0 89 24 85 67 C6 1F 27 44 D2 EE CF 65 ED FF 07 C7 46 A1 78 16 0C ED E9 2D 62 D4 54 06 00 00 34 2E 30 00

4. I need to compare the input file with the Array and return any match.

Any help on the 4 points would be much appreciated.
 
Quick question - why do you want to convert the hex values into a string? Is it because the array contains hex strings? And if so the question then becomes why does the array contain hex strings rather than hex values? Is it because there are particular patterns that you are searching for?

In other words it might be best in this scenario to explain to us what you are trying to achieve.

 
Hello strongm

I want to compare the hex of a file with those I have in the Array to see if there is a match. Because I know the files hexs in the Array if there is a match I will have identified the file.

There may be a better way to achieve this but I don’t know. I have gone as far as I know how with the code presented.
 
Let's see if I understand.

You have a bunch of fingerprints (specific hex patterns), each of which identifies a specific file. Your array contains those fingerprints (and some text to display if you find a match)

You then want to compare each of the fingerprints against some of the contents of a file to see if there is a match (and currently, at least for this example, this is the first 160 bytes of the file).





 
Excellent. Now, firstly I'd stop trying to convert things into human readable hex strings. You just don't need to, and the computer won't care.

So I suspect that you'll need to load your array slightly differently. Can you show the code you have for that?
 
From what you say it sounds like I have gone about it the wrong way, which doesn’t surprise me. I struggled to get as far as I did.

All the code I have so far is as I have posted. So the Array code is the first block titled My Array.

I was at a loss how to continue with my code so whatever you come up with can only be an improvement on that, especially if it does the job. Thank you.
 
Code:
[blue]Option Explicit
Private DataHold(2, 2) As Variant ' just a small one for purposes of the example

Public Sub Main()
    LoadArray
    MsgBox "Ok, checking first 10 chars ..."
    GetFileHex 10 ' just examine first 10 bytes of file
    MsgBox "Ok, checking first 160 chars ..."
    GetFileHex 160 ' now let's try first 160 bytes (note doesn't matter if file is shorter than 160 bytes)
End Sub

Public Sub LoadArray()
    Dim fingerprint(2) As String
    Dim mybyte
    Dim lp As Long
    
    fingerprint(0) = "0B 0C 00 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A"
    fingerprint(1) = "0A 0C 00 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A"
    fingerprint(2) = "0A 0C 00 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 11"
    
    For lp = 0 To 2
        DataHold(lp, 1) = ""
        For Each mybyte In Split(fingerprint(lp), " ")
            DataHold(lp, 1) = DataHold(1, 1) & Chr(Val("&H" & mybyte))
        Next
        DataHold(lp, 2) = " matches fingerprint " & lp
    Next
End Sub

Public Sub GetFileHex(Optional maxchars As Long = 160)
Dim intFileNumber As Integer
Dim strBuffer As String
Dim strFileName As String
Dim lp As Long
Dim max As Long
  
' My test file starts with the following bytes: 0A 0C 00 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A
strFileName = "c:\downloads\Test.txt"
  
intFileNumber = FreeFile

Open strFileName For Binary Access Read Shared As #intFileNumber
If maxchars > LOF(intFileNumber) Then maxchars = LOF(intFileNumber)
strBuffer = Space$(maxchars)
Get #intFileNumber, , strBuffer
Close #intFileNumber
 
'All done. Let's see if we can match a fingerprint
For lp = 0 To 2
    max = Len(DataHold(lp, 1))
    If max > maxchars Then max = maxchars ' just to make sure we check correct number of bytes
    If Left(DataHold(lp, 1), max) = Left(strBuffer, max) Then
        MsgBox strFileName & DataHold(lp, 2)
    End If
Next

End Sub[/blue]
 
Well, that’s got it.
Thank you indeed.
Just a couple of things I don’t understand.

1.
When I replace the Array line (1) with

fingerprint(1) = "00 01 00 00 53 74 61 6E 64 61 72 64 20 4A 65 74 20 44 42 00"

which I know is an .mdb file and run the function Main

I get checking first 10 bytes message followed by three messages saying matched i.e. fingerprint(0) , (1) and (2)

(which I don’t get as only 1 is a match?)

Followed by

checking first 160 bytes
and matched fingerprint(1)

(which is what I would expect as 1 is a match)

2.
how would I get the array to include my comments in the 2’s lines
i.e.

DataHold(1, 1) = " 0B 0C 00 64 65 66 67 68 69 6A 6B 6C 6D"
DataHold(1, 2) = "[highlight #FCE94F]this is access db[/highlight]"
DataHold(2, 1) = " 00 01 00 00 53 74 61 6E 64 61 72 64 20"
DataHold(2, 2) = "[highlight #FCE94F]this is text file[/highlight]"
DataHold(3, 1) = " 0A 0C 00 64 65 66 67 68 69 6A 6B 6C 6D"
DataHold(3, 2) = "[highlight #FCE94F]this is someother file[/highlight]"


 
>followed by three messages

Um - yes - teeny tiny bug in the code

>how would I get the array to include my comments in the 2’s lines

Er ... just include those lines ...

Here's a new version (including the bug fix)...
Code:
[blue]Option Compare Database
Option Explicit

Private DataHold(2, 2) As Variant ' just a small one for purposes of the example

Public Sub Main()
    LoadArray
    MsgBox "Ok, checking first 10 chars ..."
    GetFileHex 10 ' just examine first 10 bytes of file
    MsgBox "Ok, checking first 160 chars ..."
    GetFileHex 160 ' now let's try first 160 bytes (note doesn't matter if file is shorter than 160 bytes)
End Sub

Public Sub LoadArray()
    Dim fingerprint(2) As String
    Dim mybyte
    Dim lp As Long
    
    fingerprint(0) = "0B 0C 00 64 65 66 67 68 69 6A 6B 6C 6D"
    fingerprint(1) = "00 01 00 00 53 74 61 6E 64 61 72 64 20"
    fingerprint(2) = "0A 0C 00 64 65 66 67 68 69 6A 6B 6C 6D"
    
    For lp = 0 To 2
        DataHold(lp, 1) = ""
        For Each mybyte In Split(fingerprint(lp), " ")
            DataHold(lp, 1) = DataHold(lp, 1) & Chr(Val("&H" & mybyte))
        Next
    Next
    DataHold(0, 2) = "this is text file"
    DataHold(1, 2) = "this is access db"
    DataHold(2, 2) = "this is someother file"

End Sub

Public Sub GetFileHex(Optional maxchars As Long = 160)
Dim intFileNumber As Integer
Dim strBuffer As String
Dim strFileName As String
Dim lp As Long
Dim max As Long
  
' My test file starts with the following bytes: 0A 0C 00 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A
' mY test mdb is  indeed an mdb
strFileName = "f:\file downloads\test.mdb" '"c:\downloads\Test.txt"
  
intFileNumber = FreeFile

Open strFileName For Binary Access Read Shared As #intFileNumber
If maxchars > LOF(intFileNumber) Then maxchars = LOF(intFileNumber)
strBuffer = Space$(maxchars)
Get #intFileNumber, , strBuffer
Close #intFileNumber
 
'All done. Let's see if we can match a fingerprint
For lp = 0 To 2
    max = Len(DataHold(lp, 1))
    If max > maxchars Then max = maxchars ' just to make sure we check correct number of bytes
    If Left(DataHold(lp, 1), max) = Left(strBuffer, max) Then
        MsgBox strFileName & DataHold(lp, 2)
    End If
Next

End Sub
[/blue]
 
That’s great, does exactly what I wanted.
A big thank you.
 
Oh dear!
Having tested this more thoroughly it appears not to be working as I had hoped.

Here is the code kindly provide by strongm slightly changed by myself.

Code:
Option Compare Database
Option Explicit

Private DataHold(4, 2) As Variant 'Create the array.

'This requires a reference to the Microsoft Office 10.0 Object Library.
Public Sub Main()
Call LoadArray 'Load the array.

Dim fDialog As Office.FileDialog
Dim varFile As Variant

'Set up the File dialog box.
Set fDialog = Application.FileDialog(msoFileDialogFilePicker)
With fDialog
  .AllowMultiSelect = False 'Allow the user to make multiple selections in the dialog box.
  .Title = "Browse" 'Set the title of the dialog box.
  .Filters.Clear 'Clear out the current filters and add our own.
  .Filters.Add "All Files", "*.*"
  If .Show = True Then 'Show the dialog box.
    For Each varFile In .SelectedItems
      Call GetFileHex(varFile, 20)
    Next
  End If
End With

End Sub

Public Sub LoadArray()
Dim fingerprint(4) As String
Dim mybyte
Dim lp As Long

fingerprint(0) = "00 01 00 00 53 74 61 6E 64 61 72 64 20 4A 65 74 20 44 42 00"
fingerprint(1) = "00 01 00 00 53 74 61 6E 64 61 72 64 20 41 43 45 20 44 42"
fingerprint(2) = "52 61 72 21 1A"
fingerprint(3) = "37 7A BC AF 27 1C"
fingerprint(4) = "D0 CF 11 E0 A1 B1 1A E1 00"

For lp = 0 To 4
  DataHold(lp, 1) = ""
  For Each mybyte In Split(fingerprint(lp), " ")
    DataHold(lp, 1) = DataHold(lp, 1) & Chr(Val("&H" & mybyte))
  Next
  'DataHold(lp, 2) = " matches fingerprint " & lp
Next

DataHold(0, 2) = vbCrLf & "Microsoft Jet DB"
DataHold(1, 2) = vbCrLf & "Microsoft Access 2007 Database"
DataHold(2, 2) = vbCrLf & "RAR Archive"
DataHold(3, 2) = vbCrLf & "7-Zip Compressed Archive"
DataHold(4, 2) = vbCrLf & "Microsoft Access Project"

End Sub

'Identify file types from their binary signatures.
'It doesn't matter if file is shorter than 160 bytes.
Public Sub GetFileHex(ByVal strFileName As String, _
  Optional maxchars As Long = 20)
Dim intFileNumber As Integer
Dim strBuffer As String
Dim lp As Long
Dim max As Long
Dim match As Boolean

intFileNumber = FreeFile
match = False

Open strFileName For Binary Access Read Shared As #intFileNumber
If maxchars > LOF(intFileNumber) Then maxchars = LOF(intFileNumber)
strBuffer = Space$(maxchars)
Get #intFileNumber, , strBuffer
Close #intFileNumber

'All done. Let's see if we can match a fingerprint.
For lp = 0 To 4
  max = Len(DataHold(lp, 1))
  If max > maxchars Then max = maxchars 'Just to make sure we check correct number of bytes.
  If Left(DataHold(lp, 1), max) = Left(strBuffer, max) Then
    MsgBox strFileName & DataHold(lp, 2)
    match = True
  End If
Next

If match = False Then MsgBox "No match"

End Sub

I think the code does what I set out to do, that is to say it compares the fingerprint hex string to the input file. But now I wonder if that’s enough because the code throws up a match for files it shouldn’t as well as for those it should?

Let me explain.
For example fingerprint(0) should only identify a file type of Microsoft Access 2002 database however if I input a .mde or 2000 .mdb or 2003 .mdb it still matches fingerprint(0).
Another and even more confusing example is fingerprint(4) which should only identify a file type of Microsoft Access Project however if I input a Word Document .doc or Excel .xls it still matches fingerprint(4).

So my questions become:
Is it possible to identify file types from their binary signatures, and if it is then what’s missing from the code?
Would some sort of function generate different fingerprints which make the matches more accurate?
Is there something else in a file to match against in addition to the fingerprints which will make an accurate match?


 
Unfortunately it is because your signatures are dubious at best, combined with the fact that identifying most files from short, simple fingerprints really isn't always that easy.

For example your fingerprint

fingerprint(0) = "00 01 00 00 53 74 61 6E 64 61 72 64 20 4A 65 74 20 44 42 00"

is really just looking for the words "Standard Jet DB", which you'll appreciate is used by pretty much anything that can save an MDB ...

and fingerprint(1) is looking for "Standard ACE DB" (actually "Standard ACE D@" is what you seem to be looking for). Again, you may appreciate that this is used by Access 2007 and later rather than a specific version of Access

As for fingerprint(4) ... Office documents (prior to 2007) were by default stored as what are known as compound files/documents (which is essentially a format for storing numerous files and streams within a single file on a disk). Part of that (fairly complex) format is a 512 byte header, the first 8 bytes of which is a signature identifying the file as being a compound document file followed by 16 bytes containing 0 - and your fingerprint exactly matches that signature.

In other words, fingerprint(4) will match any compound document file, so all Office pre-2007 files, Access Projects, and others

Here is a link to the microsoft Office file formats:
And here's on to a brief and somewhat simplistic overview of the compound file format:
Good luck ...
 
Hello again strongm

Thank you for your reply, I kind of expected to hear what you said, it seemed the logical outcome of my testing over the past week. But it’s good to know for sure from someone more knowledgeable than me on file signatures and vba, so thank you for that.

I noticed in my testing that if I read in much more of the hex string the problems mentioned go away but new problems arise, that is to say with longer hex strings the files are identified accurately but to the point that it must be the file to which the hex string was taken from, including any data that was present in the file when the hex string was taken.

In the final analysis the code still has uses in providing ‘loose’ matches with short fingerprints for some files and for other files more accurate matches.

On a final note is there a way to improve on the accuracy of the match in the code?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top