Smart questions
Smart answers
Smart people
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

LINK TO THIS FORUM!

Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Partner With Us!

"Best Of Breed" Forums Add Stickiness To Your Site
Partner Button
(Download This Button Today!)

Feedback

"...At last there is indeed a website/forum that deals with professional and serious matters. Keep up with the good work!!"

Geography

Where in the world do Tek-Tips members come from?
OrthoDocSoft (Programmer)
19 Jun 12 15:23
Dear folks,

I am trying to get used to using "checksum" to determine if my data transfers to my database are correct and complete. Would someone please write the very, very, very simplest algorithm to illustrate how this is done?

Let's say we're going to write "Hello" to a record in a database. How would you "checksum" that?

Thanks,

Ortho

lookaround "you cain't fix 'stupid'..."

Hypetia (Programmer)
19 Jun 12 15:55
>very, very, very simplest algorithm

Try HashData function. It returns a hash code of variable length (upto 256 bytes). You can use this hash as a checksum.

See the following sample function, which hashes a string.
___

Private Declare Function HashData Lib "shlwapi" (pbData As Any, ByVal cbData As Long, pbHash As Any, ByVal cbHash As Long) As Long
Private Function HashString(Text As String) As Long
HashData ByVal Text, Len(Text), HashString, Len(HashString)
End Function

___

The function returns a long integer (4-byte hash) and HashString("Hello") returns -1235047813 (&HB662AA7B).

You can use this function to hash any kind of data, including arrays.
barryna (Programmer)
21 Jun 12 2:32
Use a common hash algorithm like MD5, VB6 module to include in your project can be found here. http://sourceforge.net/projects/vb6md5hashcode/

Run the hash on the original data and the new data, if hashes match, your data is verified. No need to write your own hash method, use MD5, it runs very quickly and is designed for any type of data, any size of data.

Creator of www.meltedjoystick.com - Game Reviews, Game Lists, and much more!

strongm (MIS)
21 Jun 12 4:45
barryna,

In essence is this not the solution that Hypetia has already provided?

I concur that using one of the public hashing algorithms is probably a better idea (In my opinion Microsoft's hashdata function is flawed since the algorithm is, to the best of my knowledge not publically documented, and thus remains untested)

There are, however, easier ways of generating an MD5 hash digest in VB than the linked article. We have covered it a number of times in this forum, but in summary we can do:

CODE

Public Function MD5HashDigest(strSource As String) As String

With CreateObject("CAPICOM.HashedData")
.Algorithm = 3 'CAPICOM_HASH_ALGORITHM_MD5
.Hash StrConv(strSource, vbFromUnicode)
MD5HashDigest = .Value
End With

End Function
tedsmith (Programmer)
21 Jun 12 10:07
It depends on how secure you want to be.
The most basic method of deriving a checksum of using only simple VB code is to add up all the ASCII values of the characters.
To handle the checksum easier, convert the sum to hex and use say the last 4 hex bytes as the final checksum.
The possibility of a missing bit producing the same value is pretty remote
Something like this for illustration -

CODE

CheckSum=0
for a=1 to len(MyData)
CheckSum=CheckSum + asc(Mid(MyData,a,1))
next
CheckSum=Right(("0000" & Hex(CheckSum),4)
"Hello" gives a value of 01F4 But of course so will "Hdmlo" but it would be bad luck if one byte increased and an adjacent byte decreased by the same amount.
The longer the data string the better it would be

Then when you want to verify the data, test the data again and compare it with the original Checksum that you would have to store in another column

Of course if the string was huge like a complete novel or a picture, it would take a fair amount of time to add up all the letters.

A more involved and secure method using only VB6 code uses the Polynomial method. This is ideal for data transmission and is usually tacked on the end of the String when sent.

CODE

Function CRC_Calc(CRC_Message As String) As String
'Calculates a 16 bit checksum of a String of any length.
Dim Polynomial16 As Long
Dim Y As Integer
Dim Char_Text_DEC As Long
Dim X As Integer
Y = 1
X = 0
CRC_value = 0
Polynomial16 = 33800 'Polynomial &H8408
For Y = 1 To Len(CRC_Message)
CRC_Text_Single = Mid$(CRC_Message, Y, 1)
Char_Text_DEC = Asc(CRC_Text_Single)
For X = 1 To 8
LSB_CRC = CRC_value And &H1
LSB_Char = Char_Text_DEC And &H1
If LSB_CRC = 1 And LSB_Char = 1 Or LSB_CRC = 0 And LSB_Char = 0 Then
CRC_value = Fix(CRC_value / 2)
Char_Text_DEC = Fix(Char_Text_DEC / 2)
ElseIf LSB_CRC = 0 And LSB_Char = 1 Or LSB_CRC = 1 And LSB_Char = 0 Then
CRC_value = Fix(CRC_value / 2)
Char_Text_DEC = Fix(Char_Text_DEC / 2)
CRC_value = Polynomial16 Xor CRC_value
Else
End If
Next X
Next Y
If Len(Hex(CRC_value)) = 4 Then
CRC_LO = Mid$(Hex(CRC_value), 3, 2)
CRC_HI = Mid$(Hex(CRC_value), 1, 2)
ElseIf Len(Hex(CRC_value)) = 3 Then
CRC_STRING = "0" & Hex(CRC_value)
CRC_LO = Mid$(CRC_STRING, 3, 2)
CRC_HI = Mid$(CRC_STRING, 1, 2)
ElseIf Len(Hex(CRC_value)) = 2 Then
CRC_STRING = "00" & Hex(CRC_value)
CRC_LO = Mid$(CRC_STRING, 3, 2)
CRC_HI = Mid$(CRC_STRING, 1, 2)
ElseIf Len(Hex(CRC_value)) = 1 Then
CRC_STRING = "000" & Hex(CRC_value)
CRC_LO = Mid$(CRC_STRING, 3, 2)
CRC_HI = Mid$(CRC_STRING, 1, 2)
ElseIf Len(Hex(CRC_value)) = 0 Then
CRC_LO = "00"
CRC_HI = "00"
End If
CRC_Calc = Chr("&h" & CRC_LO) & Chr("&h" & CRC_HI)
'CRC_Calc = Chr(CRC_value Mod 256) & Chr(CRC_value \ 256) 'ALTERNATIVE METHOD

End Function
dilettante (MIS)
21 Jun 12 18:47
What makes anyone think they need checksums for this in the first place?

If main memory is that unreliable you're screwed before you start, checksums or not. Disk I/O already has integrity checks. IP has integrity checks. USB has integrity chcks. DBMSs have integrity checks.

If you are having data corruption problems it is far more likely you simply have bad code screwing it up or have failed to manage concurrency properly... or you have had catastrophic failures during write operations. The latter will be detected at the DBMS level before you're ever going to find it by in-band data checksums.

Something seems very wrong if you are asking for something like this.


CRCs are pretty darned weak compared to something like an MD5 hash, and I don't think I've ever seen such a slow and convoluted way of calculating one before the code above. What's with all those undeclared variables, and those string operations, and worse yet Variant-returning operations, etc. anyway?

Or goofy things like:

CODE

ElseIf Len(Hex(CRC_value)) = 0 Then
... paths that will never be taken.
barryna (Programmer)
21 Jun 12 19:25
The question at hand is not meant to be scrutinized. We don't know the extent of what he may be trying to accomplish. I do know from past experience if you are working with sensitive data, you want to make sure the process you ran worked as desired. A checksum or hash method can check that, instead of a human reviewing the end result for correctness. It doesn't mean there are hardware issues.

Creator of www.meltedjoystick.com - Game Reviews, Game Lists, and much more!

tedsmith (Programmer)
21 Jun 12 20:03
Yes it is goofy (it was written in the last century by persons unknown - who knows, it might even have been strongm!)
I didn't intend anyone to ever use either of my examples because I thought the original question was more about understanding how simple checksums could be generated rather than have an efficient code snippet to use.

Quote "the very, very, very simplest algorithm to illustrate how this is done?"

dilettante (MIS)
22 Jun 12 2:14
Well the simplest would be to simply sum the bytes (or characters, since we're dealing with 16-bit Unicode), possibly adding a modular operation to handle "wrap around" values as the sum grows large.

I'd avoid ANSI conversions, they're unfaithful crossing locales by nature. Such a "checksum" calculated using Locale A gives potentially different results when done in Locale B given the same inputs unless you luck out and only the 7-bit ASCII subset was used in the String.

Quote:

The question at hand is not meant to be scrutinized.

Sure it is. Poor practice is poor practice. Use of any kind of "checksum" here is redundant and adds nothing in the way of "security" (or integrity, which is what you probably meant).

About the only places you'd use them would be for things like serial data links with no error checking protocol, or hand-entry of something like account numbers containing check digits.
strongm (MIS)
22 Jun 12 4:21
>who knows, it might even have been strongm!)

Certainly not
tedsmith (Programmer)
22 Jun 12 7:38
I though you would have been into this one by now so I was just checking if you were awake!

I think the "ElseIf Len(Hex(CRC_value)) = 0 Then" is to pad out the "hex" string with 00 so you always return 4 byte hex checksum string even if the data is blank.
OrthoDocSoft (Programmer)
22 Jun 12 13:17
Wow! That prompted a debate....

What I'm REALLY trying to do is to prove within my code that the data I THINK I wrote to a record is the data that I DID write.

What I'm actually doing is:

1) writing the data.
2) reading back the data I just wrote.
3) comparing the read-back to the original data sent.
4) if NOT the same, then I try to FIND the record by its "minimum ID". If I find it, I UPDATE that record and go back to number 2) above and re-confirm.
5) if I can't FIND the record by its "minimum ID" (10 tries, mind you), I REWRITE the whole record and go back to 2.

This "works", but I'm sure there is a more elegant way to accomplish this.

The MAIN reason I do this is because I am writing data through wireless networks that fail, temporarily, from time to time. Usually just for seconds....

What do you all do?

Thanks,

Ortho.



lookaround "you cain't fix 'stupid'..."

tedsmith (Programmer)
23 Jun 12 0:01
So you are verifying the wireless part of it rather than the total storage of data.
Echoing back a simple checksum over just the wireless part does avoid having to resend back the whole data for verification.

Or
I had a similar problem with a dodgy network so I put all data into a "PropertyBag"
and decoded it using a stream.
If you are happy your database will store OK, this has an advantage that the received data has to be complete before you even save it in a record and gives an error if not. You don't have to generate or save the checksum. You can then try to send a number of times until it is OK before being saved.

Hypetia (Programmer)
23 Jun 12 5:31
I proposed the HashData solution because of the "very, very, very simplest algorithm" requirement.

Although the algorithm is not documented (at least I am not aware of it), I don't mind using it for computing a simple one-way checksum for data integrity check. We are not using it for encryption, when the knowledge of actual algorithm and proof of its reliability is of acute importance.

At least, it is better and faster than adding or XORing byte values together in a loop.

As far as simplicity is concerned, strongm's MD5HashDigest function also looks a treat. Unfortunately, on my computer, running Win7 Pro 64-bit SP1, it throws error 429, "ActiveX component can't create object". I don't know which library is required for creating CAPICOM.HashedData object.

Whichever method you use, you should write your code in such a way that checksum of data written to database is computed by the computer hosting the database. You can accomplish this by wrapping your hash function in an ActiveX component and intantiating it remotely on the target computer. The data written to the database will be available to the hash function on the remote computer locally (on the same machine) and checksum will be computed faster.

There is no point in computing the checksum from your own computer, attempting to write data, because in this case, the whole data will be read back over the network again. If you do so, than computing the checksum will be useless, you can simply compare the written data to original data as you are currently doing.

Hope you understand the point.
strongm (MIS)
23 Jun 12 14:22
Hypetia, under W7:

CODE


' W7: Add a reference to CAPICOM V 2.1 type library
Public Function MD5HashDigest(strSource As String) As String

With New CAPICOM.HashedData
.Algorithm = CAPICOM_HASH_ALGORITHM_MD5
.Hash StrConv(strSource, vbFromUnicode)
MD5HashDigest = .Value
End With

End Function

>proof of its reliability is of acute importance.

Proof that it actually captures bit errors would be important in this case. And we don't have that evidence. Now, whilst Microsoft have a history of developing home-grown encryption that is somewhat flawed, I'm not saying it is a poor algorithm - we don't know that either - what I'm saying is that, given we have well-known alternatives that have been analysed to death we might prefer to use them when appropriate.
Hypetia (Programmer)
24 Jun 12 6:45
Thanks, it works now.
Hypetia (Programmer)
24 Jun 12 10:11
Well, further reading about MD5 reveals that it also has severe security vulnerabilities and susceptible to collision attack.

http://en.wikipedia.org/wiki/MD5

Although the hash can easily be compromised as they say "it is easy to generate MD5 collisions", it is still widely used for data integrity check and also for storing passwords.
strongm (MIS)
24 Jun 12 13:03
The difference is we know the weaknesses of MD5. And with MD5 we can say with a very high level of confidence that a hash produced from a set of data is not going to be the same as a hash of that data with a few bits changed, where those bits have changed by chance (say due to transmission errors, which is what we are looking at here). If I wanted a cryptographically secure hash, however, I wouldn't recommend it.

And if you are running on W2K3 or later, and a high level of secure hashing is required, we can easily bump up to SHA-2:

CODE

Public Function HashDigest(strSource As String) As String

With New CAPICOM.HashedData
.Algorithm = CAPICOM_HASH_ALGORITHM_SHA_256 'or 384 or 512
.Hash StrConv(strSource, vbFromUnicode)
MD5HashDigest = .Value
End With

End Function


Hypetia (Programmer)
24 Jun 12 18:18
>CAPICOM_HASH_ALGORITHM_SHA_256

Fair enough. You might want to change

MD5HashDigest = .Value
to
HashDigest = .Value
strongm (MIS)
24 Jun 12 19:00
Yes, my fault for making a quick hack in Notepad.

 

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close