Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

garbling a name 1

Status
Not open for further replies.

Bryan - Gendev

Programmer
Jan 9, 2011
408
AU
I have been asked to create a program to 'hide' the actual names in a certain field in a VFP table. Thus I would replace the existing string of characters by a new string of random chrs.

thus far I have the code to loop through each chr in the string but I am not able to do the replacement chr by chr. My first attempt is a function thus
Code:
Function garble(oldchar)
LOCAL charnum
LOCAL newchar
LOCAL newcharnum
charnum=INT(ASC(oldchar))
newcharnum = charnum * RAND()
newchar= CHR(newcharnum )
Return newchar
this of course produces a range of characters not all of which are in the range A-Z.
How will I create replacement chrs in the range A-Z?
Many thanks

gendev
 
>I doubt if ...

I am aware of that. I am not suggesting that real encryption would result in a sortable set of results consistent with the original data.

>As Olaf pointed out "Olaf" will always to transformed to "HWÏV"

Not sure what point you are trying to make here.

>Take care: This is NOT a decryption code in anyway it is a 'garbling' code, or as Olaf says an 'encoding' code, it simply replaces a character with an other as given in de constant ENCRYPTY and changes back as given in the constant DECRYPTY. So you can index.

I am afraid that I have to disagree with you (and Olaf). Your code is an implementation of what we call a monoalphabetic substitution cipher. By today's standards a very weak method of encryption (since your method uses a mixed alphabet it is somewhat stronger than a classic Caesar cipher, but weaker than a polyalphabetic cipher such as the Vigenere cipher), but a method of encryption (and decryption) nevertheless.
 
It only would be a real Cesar cipher (and it is despite the longer alphabet), if you wouldn't provide the constants but the shift (rotation) offset as the "password". Letting ENCRYPTY be the scrambled letters, it is already a bit stronger than a normal Cesar cipher, but it can only be considered encryption, if the algorithm isn't known at all, it has to be secret how the encryption is done, that and also it resulting in always the same output (unless you change ENCRYPTY) rather makes it act as an encoding than as an encryption.

So let's say I apply other (stricter) criteria. I also don't consider Cesar cipher a "cipher" or encryption anymore. You just need 26 tries to break it, if you know the method. That makes it less deterministic and 1:1 mapping as a normal encoding is, but it can't be considered safe. Every O becomes an H, if you don't change ENCRYPTY and hat's the nature of an encoding.

Jockeys code could be put into the direction of an even rather strong encryption if you do exactly that and let the ENCRYPTY value change perhaps after each single character translation, perhaps controlled by a password as a second input besides the text to be encrypted. Perhaps with some sand random noise. If that is made movable there is no 1:1 mapping of original/encoded character. It could be done in a deterministic way also reproducible for decrypting.

Bye, Olaf.
 
Hi Strongm,

sorry but your quote I am afraid that I have to disagree with you unquote seems to me you are missing the essence.
The code I have shown is NOT an encryption, as stated several times. You may call it a "monoalphabetic substitution cipher" which is fine for me, however it is a simple transformer meaning it transforms a letter into an other letter, thus Olaf will become HWÏV which is sometimes enough for peaking eyes not to see at a glance HWÏV is acutaly Olaf. And since the transforming is done consequently you can also, a requirement, meaningfull index.

Please read the requirements Gendev made in his initial request, nowhere he is asking for an encryption procedure, he wants to 'hide'.

Regards,

Jockey2
 
> it is already a bit stronger than a normal Cesar cipher

I believe I said that

> it can only be considered encryption, if the algorithm isn't known at all
You may want to share that view with the cryptographic community. They may disagree with you. The algorithm for Rijndael/AES, for example, is well known; it is even published as a standard. Modern cryptography is based on the principle that a cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

Knowing that a message is encrypted with a Caser cipher, and knowing the Caesar cipher algorithm doesn't render the message immediately readable - since you still need to know the key (i.e. the shift). Sure, this is pretty easy to break (as we've already agreed, Caeser ciphers are weak), but just because encryption can be broken doesn't mean that it is no longer encryption - just means that you might not want to use it ...

>You just need 26 tries to break
25 (26th is plaintext already). Or, more generically: n -1, where n is the size of the alphabet

>let the ENCRYPTY value change perhaps after each single character translation
Which would mean you now had a running key polyalphabetic cipher, so stronger than Vigenere - and if the running key were truly random, and at least as long as the plain text, and you only ever used it once, then you'd have one-time pad ...
 
>You may want to share that view with the cryptographic community. They may disagree with you.

You misinterpret what I said and turned the words in my mouth. I didn't say this is the essential property a good encryption algorithm has to have. I said this to point out it is a bad property of Jockey's code and makes it disqualify as encryption. This all was just to argue against your categorization as such.

Yes, this is a bad property of Jockeys cypher. And that's why your argument about it being an encryption fails. Also Jocke did never intended to write an encryption. Since you are as knowledgeable about encryption and perhaps even more than I, this should have helped you see your arguing for categorizing Jockey's algorithm as encryption is wrong. Cesar cypher in itself also is not an encryption applying today's criteria, as indeed it should be possible to publish an algorithm and use it as is without compromising encrypted data, i.e. the knowledge of the algorithm does not make encrypted data decryptable. If you see the ENCRYPTY value as a key, you might consider it going in that direction, but as that key needs to be composed of all characters that only makes the way these characters are permutated the real key. It's still a vast range of possible keys, but providing it with the code itself that makes it no key.

I would have to verify, but maybe it is even more generally true for any cipher or chiffre to be no encryption, but something in between encryption and any usual canonically straight forward encoding. With that I want to say the usual intention of an encoding is surely not to obscure data, but to map a more or less large character set to some byte codes.

Bye, Olaf.

 
Strongm,

Did you not read what the explication / remark of the procedure says, 1st sentence ?
Please donot compare / judge this procedure in any way with encryption. It has nothing at all to do with encryption.

Jockey2
 
>The code I have shown is NOT an encryption, as stated several times

Stated by you, yes. But so what? Forgive me, saying so does not make it so. You have simply reinvented a classical substitution cipher, eg
> Also Jocke did never intended to write an encryption
Doesn't matter what Jocke intended. See my comment and link above. Sure, the inclusion of the key makes it child's play to break, but it is encryption nonetheless.

And let's go a step further: encryption is simply some process (i.e. an algorithm) to make information hidden or secret. And to make that process useful, you need some code (or key) to make information accessible. It takes no account of how easy it is to determine the key or reverse the process. The Caesar cipher is still encryption (the key being how many characters we shift), even if it is very, very easy to find that key.

The most simple definition of encryption, though, is that it is the process which converts information or data into a code. And perhaps this is the source of the confusion - encryption is indeed a form of encoding. But not all encoding is encryption. The examples used/discussed in this thread, however, are all encryption


>If you see the ENCRYPTY value as a key
It is a key (a slightly broken key, because the alphabet it is derived from is broken - try encrypting and then decrypting "ääää")

> providing it with the code itself that makes it no key
No, it simply makes it easy to retrieve the plaintext - but it is still a key. Note that if I foolishly publish the key I use to encrypt with AES in ECB mode, then it is easy to recover the plaintext (AES is symmetric) - but that doesn't stop the key from being a key, nor does it disqualify AES ECB as encryption

>it can only be considered encryption, if the algorithm isn't known at all
>it is a bad property of Jockey's code
The issue isn't the algorithm, the issue is the inclusion of the key that is in use. Can't argue with that. But, as I said above, if you expose the key then even AES is equally easily compromised - but that doesn't make AES a bad algorithm (or set of algorithms), nor stop it from being considered encryption.

>You misinterpret what I said
Yep, looks like I may have done, so apologies for that. However, I still don't agree with your conclusions even with the misunderstanding cleared up (see points above)

>Cesar cypher in itself also is not an encryption applying today's criteria
Yes, it is. Encryption has a definition in cryptography, and the Caeser cipher meets that definition. It just isn't very secure anymore. You wouldn't go around trying to say that a medieval shield was no longer a shield just because it isn't very useful on a modern battlefield.
 
Strongman,

I noticed your above reply, but sorry I stopped reading after the first ?.
It seems that you are seeking for a being right although you know you are not.
I have stated the code is not an encryption.
If you are willing to pull all kind of excuses and proofs that my coding is not encryption than you are correct, although this is useless as I already told you so.
I did not reinvent any new wheel I just showed the OP how to do this classical substitution in VFP. The code is so basic it is not even worth to discuss, it works and that what counts.
So please stop now being the wise guy and stop to tell us my coding is not an encryption as it is not.

Regards,

Jockey2
 
Well, OK, never mind strongm, interesting thoughts, but by means of all the definition of encryption, a key is part of it, when major encryption categories are differentiating between symmetric (one secret key) and asymmetric (public/private key pair) algorithms and no other category exists.

We could agree on it being a symmetric cryptographic encryption with its key given within the code, which makes it an open secret. There's no such thing as a public key in a symmetric algorithm. We can stop to quarrel about definitions here, I'll simply agree with you. The way it's provided it's still a lock with the key stuck in it.

You'll also have to agree, that Jockey is right in his comment about the warning this is not good to use for encryption, even if it is an encryption and thereby give Jockey the grace of still making a correct classification of the usability limits of this. It is quite nice for garbling names in a recoverable manner, results in something not really readable, but still at least printable and not cluttered with control codes or anything else coming from a normal binary encryption. And it's reversible, which is or is not intended, we'll most probably never know.

If you make it so the replacement characters are all having a higher byte code than the original characters, it could even be used to keep the sort order of garbled names. For that to work with a range of letters, you would rather end up with Cesar cipher. Including several printable characters after the letters, you could just leave some gaps, if you reduce the original character set to make it a bit more complicated than just the shifted up alphabet or alphabets (if you make a distinction of small and capital letters). That way it could end up looking quite like base64.

Bye, Olaf.




 
> I stopped reading after the first ?.

A pity, since that presumably means you couldn't be bothered to follow the link to an independent 3rd party cryptography expert (one of many, many) that makes it clear that a substitution cipher, using your exact method, is encryption. Here's another one, from the Computer Science department of the University of Rhode Island. If you looked at them, you wouldn't have to take just my word for it, which is lucky because you've clearly taken umbrage with me at a minor correction in terminology. Not quite sure why, given that at no time have I suggested that your code is an inappropriate solution for the OPs stated requirements.

>it works and that what counts
Again, never said otherwise (although I grant that did point out that there is a flaw in your ENCRYPT and DECRYPT strings which breaks the symmetrical encryption/decryption under certain circumstances - but not with the algorithm itself)

>excuses
Excuses? What excuses? Are we suffering from a language barrier here?

>although you know you are not.
Oh dear. No, quite the contrary. I do some of this for a living. But I know you have decided you don't like anything I say, which is why I have provided independent links that confirm the argument I am making.

>I have stated the code is not an encryption
So, if I state that grass is red, that makes it red, does it? The point is that the facts don't support your statement. Let me ask a question - if I used your algorithm to 'scramble' some plain text - but using a different ENCRYPT (something Olaf alluded to earlier) - and gave you that 'scrambled' text, would you easily be able to 'unscramble' to the original plain text? Or would you somehow need to gain access to or figure out what ENCRYPT was? Mind you, you probably have not read this far ...

laËÊRôçqG

 
Strongm,

Thanks, good catch!

find below the corrected strings:
#Define DECRYPTY "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890äáàâÄÁÀÂëéèêËÉÈÊïíìîÏÍÌÎöóòôÖÓÒÔüúùûüÜÚÙÛÇç@.-&#"
#Define ENCRYPTY "Ï3ÍTUVÂë56éWXYêËQÛÇBÉZaÚ4ÙbÌÎöfghijkó7pq8HIâÄÔáAsCÁ90òMwxyNOPÒRSîúùnorstûÜç@.-&#GÀèJKLôÖz12äDEFcdelmuvàÈÊïíìÓü"

Jockey(20
 
Olaf, pretty much agree with all your points in your last post.

Jocky2, I think you may need to change DECRYPT as well, to "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890äáàâÄÁÀÂëéèêËÉÈÊïíìîÏÍÌÎöóòôÖÓÒÔüúùûÜÚÙÛÇç@.-&#
 
Strongm,

I have editted the constants in my message 22.8 20:15
Thanks for the remark about the incorrect constant.
I suppose (hope) it now works as expected.

Regards,

Jockey2
 
Thanks to all and especially jockey2 who got me on track with a scrambling routine that satisfies my client. He created new ENCRYPT and DECRYPT strings to do the job in his particular way.
I am now trying to 'functionise' my long-winded prg which loops through 29 tables in a project.
I am unable to pass through a fieldname as a parameter to get the work done in an efficient prg.
My function works within a Do while !(eof) loop and is called by
thefield = 'firstname'
Do fielding With thefield

Code:
Function fielding
Parameters cmyfield

Go Top
Strtofile(cmyfield+Chr(013)+Chr(10),('garble.log'),.T.)
Do While !Eof()

	fcontent =  Alltrim(cmyfield) && original field contents 
	fncontent = scrambling (fcontent )&& scrambled field contents 

	cMessageText = Str(Recno())+'  ' +fcontent+'  '
	Strtofile(cMessageText+Chr(013)+Chr(10),('garble.log'),.T.)
	cMessageText = Str(Recno())+'  ' +fncontent
	Strtofile(cMessageText+Chr(013)+Chr(10),('garble.log'),.T.)
	If ldoit
		Replace cmyfield With fncontent
	Endif

	Skip

Enddo

How do I retain the fieldname in cfield so that I can get the contents of each field rather than the fieldname itself?
Thanks
GenDev
 
You now introduce two ways to decode the name again, once by using the scrambling with scrambling(fcontent,.F.) and once because you write original and scrambled values into a log.

If that's really the need, I go back to my suggestion of using a real strong encryption function instead of anything self-written:
myself said:
If ... you want to encrypt and decrypt names, not just "garble" them, then better make use of crypto API or - what's simpler to use - vfpencryption.fll


Especially, if it's about HIPAA compliance it's not just about "garbling" names somehow, but algorithms to be used are clearly specified.

I have to admit the last sentence was not checked well.

HIPAA Journal said:
One of the reasons why the HIPAA encryption requirements are vague and open to interpretation is that, when the original Security Rule was enacted, it was acknowledged that technology advances. What may be considered appropriate encryption standards one day, may be inappropriate another.

Similar thoughts are valid for other domains, not only patient data. Due to the advance in such algorithms it's a valid thought to not get too specific. But that surely doesn't suggest "rolling your own". I recently just had the opposite of very strict defined specs for the cash register security regulation of austria, specifying each single steps to take to create signatures of receipts and QR codes of that signature.

Bye, Olaf.
 
Don't use & with file names. Instead use parens. If you now have code like:

Code:
USE &cMyTable

change it to:

Code:
USE (cMyTable)

The & version will fail if cMyTable contains a path with a space in it.

Tamar
 
Hi Gendev,

My scrambling function has two parameters:
Parameters tcIn, tlScramble
tcIn = the word to be scrambled / unscrambled
tlScramble = .T. to scramble , .F. to unscramble

to scramble all the fields of yourtable.field1:

Code:
select yourtable
scan
replace field1 with scrambling(yourtable.field1,.t.)
endscan

As Tamar correctly pointed out: avoid using the & (macrosubstitution)
The advise here by Olaf gives you an encrypted value of your field which is not the same as a scrambled value, an encrypted value you will not be able to index logicaly as required.
A scrambled value is not at all an encrypted value and not to be used to encrypt it is just for 'peeking' eyes not to be able to read the content without the aid of a tool.

Regards,

Koen
 
Koen Piller said:
tlScramble = .T. to scramble , .F. to unscramble
You can also do it inversely, though, as the algorithm is symmetric.

Code:
lcScrambled = scrambling("Olaf",.F.) && HWÏV
? lcScrambled
? scrambling(lcScrambled,.T.) && Olaf

By the way, your constants still are broken, your latest version of DECRYPTY has 2x "ü" and I just stumbled on this using the "Olaf" example. Every character must be unique in both strings, or you get a nonreversable mapping.

Bye, Olaf.
 
And strongm,

in regard of your example of "laËÊRôçqG" non-decryptable with the original or changed constants, you don't prove much. If I would change the mapping done on the base64 encoding and choose any 64 other characters in a scrambled order and gave you a single example of my renewed encoding, you also couldn't decode it. base64 encoding would still just be an encoding. And even a weak encryption can cause a quite unsolvable problem for a short ciphered text, you can't attack it statistically, for example. If I told you I cesar ciphered a a single letter to "k", you also wouldn't know what letter I encoded, if I never tell you when you guessed right.

Besides I already said this can be seen as "a symmetric cryptographic encryption with its key given within the code". You have not revealed your constant change, so you have kept your key secret. If gendev would want to do the same, he would still need to provide the code to end users and couldn't use a constant, especially the ENCRYPTY would need to be stored somewhere encrypted itself to make this encryption safer, but that would have the ironic situation of the key being stronger encrypted as the data.

In essence anyway: Any code mapping set1->set2 and inversely set2->set1 is not an encryption but simply an encoding. All characters are always encoded the same. Unless you didn't break this attribute of 1:1 mapping in your choice of constant values, I can say one thing about "laËÊRôçqG" - that all original characters differ from each other. If you don't cheat and would encrypt "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" for me, the result would give me the decoding constant I need for letters, anything I could enter and let be encrypted gives me the inverse translation of anything written with the same letters, so that's really weak. Say I am a user and enter a record with a name "a bc def ghij" and later get at the encrypted data, I could spot the encryption of it merely by the pattern "* ** *** ****". do that several times and I have all the necessary info to decode.

So gendev, I stay with my advice to not use that, if your customer really needs a strong encryption, this isn't.

If you have any trouble with VFP encryption, let me know. I have stumbled upon errors it throws when using it wrong. For example using it with AES needs a 32-byte key, any other length needs to be padded to that. The same goes with other parameters. But the FLL works used correctly, the descriptions of the ENCRYPT and DECRYPT function explain the necessary sizes.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top