Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

garbling a name 1

Status
Not open for further replies.

Bryan - Gendev

Programmer
Jan 9, 2011
408
AU
I have been asked to create a program to 'hide' the actual names in a certain field in a VFP table. Thus I would replace the existing string of characters by a new string of random chrs.

thus far I have the code to loop through each chr in the string but I am not able to do the replacement chr by chr. My first attempt is a function thus
Code:
Function garble(oldchar)
LOCAL charnum
LOCAL newchar
LOCAL newcharnum
charnum=INT(ASC(oldchar))
newcharnum = charnum * RAND()
newchar= CHR(newcharnum )
Return newchar
this of course produces a range of characters not all of which are in the range A-Z.
How will I create replacement chrs in the range A-Z?
Many thanks

gendev
 
To create a random letter A-Z all you need is CHR(ASC("A")+INT(RAND()*26)) or CHR(65+INT(RAND()*26)).
Just ensure you initialize the random number generator with a different seed every program start by once calling RAND(-1).

The idea to input the old character makes no sense to me.

Bye, Olaf.
 
One idea to make sense of the old character: Only return a new random letter, IF ISALPHA(oldchar).

Code:
FUNCTION garble(tcOldChar)
LOCAL lcNewChar
IF ISALPHA(tcOldChar)
   lcNewChar = CHR(65+INT(RAND()*26))
ELSE
   lcNewChar = tcOldChar &&keep the non letter
ENDIF
Return lcNewChar

I would rather do it in one go with the full name instead of each char and include the loop over all name string positions in the function. It's overkill to make a call per character as that has an overhead for running very little net code each time.

Bye, Olaf.
 
Why not hash it?
Make it into a 32 character hash.

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
If it is for anonymous display garbage letters are still better for layout/display and less distracting or annoying than a hash value. For anonymization, I would go a different route of using mockup data, eg as you can get provided from and then simply fill in random first and last names of mackup data in place for the real names.

It's true, that there are different solutions for different purposes, if a non-technical customer actually meant encrypting data, neither hash nor random letters would be a good solution.

Bye, Olaf.
 
If the aim is to create test data, that is, data that is completely fictitious but nevertheless looks realistic, this is what I do:

I have a bunch of tables, each each containing ten rows. One table contains common first names, another had common family names, another has cities, and so on. I then do a cartesian join of the relevant tables. With four such tables, this gives 10^4 rows, each completely different. I then select the required number of rows at random from that result set.

For example, if I want a 100 rows, containing first name, last name, street address and city, I would do this:

Code:
SELECT ;
  FirstName, LastName, Street, City, RAND() AS Selector ;
  FROM FirstNames, LastNames, Streets, Cities ;
  INTO CURSOR csrTemp

SELECT TOP 100 FROM csrTemp ORDER BY Selector ;
  INTO CURSOR csrFinal

However, now that Olaf has told us about Mockaroo, I probably won't do that any more. Mockaroo seems to offer a great deal more data, with more realistic ways of combining it. For example, not only does it give you cities and countries, but the cities are real cities in the corresponding countries. Even the international phone dialling codes match the countries. I don't mean this to sound like a commercial, but I wish I knew about Mockaroo years ago.

Mike


__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
gendev said:
create a program to 'hide' the actual names in a certain field in a VFP table

Obviously you wouldn't be storing the data in the first place if you didn't need it in its original form at some point in time.

With that in mind, you should probably differentiate between SHOWING 'garbled' data in a user screen and possibly 'garbling' the data in the table field (which would be Encrypting such that it could be Decrypted when needed).

Merely replacing table field values with random characters will not be un-doable if needed.

Good Luck,
JRB-Bldr




 
jrbbldr, you're right,

but you could fork (if/else) between SELECT name,... FROM table and SELECT garbled(name) as name FROM table and that will not change any data, just show random letters instead. You also could jumble the data for the usecase of handing it out to external developers without giving them real data, it does not necessarily has to be done to the original data itself, but to a developer database copy or extract, at the same time perhaps shrinked in size.

Bye, Olaf.
 
Hi,

Apart from Olaf's solution you can also make use of this function to scramble the content of your database

Code:
[COLOR=#3465A4]Function scrambling
Parameters tcIn, tlScramble[/color]
[COLOR=#4E9A06]*!* function to scramble the content of fields into your cursor.
*!* this is[COLOR=#EF2929] [b]NOT[/b][/color] [b]a decryption[/b], simple a low-level scrambling.
*!* to scramble:
* Select id, scrambling(name,.T.) as name into myCursor nofilter
*!* to unscramble:
* select id, scrambling(name,.F.) as name into myCursor nofilter
[/color]
[COLOR=#3465A4]Local lcDecrypt As String, ;
	lcEncrypt As String, ;
	lcLet As String, ;
	lcScram As String, ;
	lnPos As Number, ;
	lnPosition As Number

Local lcIn As String

*!* the constants DECRYPTY and ENCRYPTY are shown here below the code

lcScram = []

Do Case
Case Vartype(m.tcIn) = 'C'
	lcIn = Alltrim(m.tcIn)
Case Inlist(Vartype(m.tcIn),'N','Y')
	lcIn = Alltrim(Transform(m.tcIn))
Case Vartype(m.tcIn)= 'D'
	lcIn = Alltrim(Dtoc(m.tcIn))
Case Vartype(m.tcIn)='T'
	lcIn = Alltrim(Ttoc(m.tcIn,1))
Endcase
If m.tlScramble = .T.
	For lnPos = 1 To Len(m.lcIn)
		lcLet = Substr(m.lcIn,m.lnPos,1)
		lnPosition = At(m.lcLet, DECRYPTY)
		lcScram = m.lcScram+Substr(ENCRYPTY ,m.lnPosition,1)
	Endfor
Else
	For lnPos = 1 To Len(m.lcIn)
		lcLet = Substr(m.lcIn,m.lnPos,1)
		lnPosition = At(m.lcLet,ENCRYPTY )
		lcScram = m.lcScram+Substr(DECRYPTY ,m.lnPosition,1)
	Endfor
Endif

Return m.lcScram[/color]

#Define DECRYPTY "abcdefghijklmnopqrstuvwxyzABSCDEFGHIJKLMNOPQRSTUVWXYZ1234567890äáàâÄÁÀÂëéèêËÉÈÊïíìîÏÍÌÎöóòôÖÓÒÔüúùûÜÚÙÛÇç@.-&#"
#Define ENCRYPTY "Ï3ÍTUVÂë56éWXYêËQÛÇBÉZaÚ4ÙbÌÎöfghijkó7pq8HIâÄÔáASCÁ90òMwxyNOPÒRSîúùnorstûÜç@.-&#GÀèJKLôÖz12äDEFcdelmuvàÈÊïíìÓü"

Jockey2
 
Hi Jockey2

Thanks for your code.
I'm trying to use it as a test but I can't get the #DEFINEs to kick in.
I've tried putting them
-at the top of main.prg
-at the top of the prg which uses the function
-within the function declaration.

None work - I haven't used DEFINE before so I am at a loss what to try next.
Can you help?

Gendev
 
DEINES have to be before the usage of the constants, that's the only condition about them.

If you want to use this code, you want to encrypt and decrypt names, not just "garble" them, then better make use of crypto API or -what's simpler to use, vfpencryption.fll


Especially, if it's about HIPAA compliance it's not just about "garbling" names somehow, but algorithms to be used are clearly specified.

Bye, Olaf.
 
Olaf,
I was keen to just try the code.
The user has explained he wants a much more complex solution whereby names will be 'garbled' the same way whenever they occur in the field so that they can sort as normal in the genealogy application.
I'm not able to code that so I have bowed out.
Many thanks

Gendev
 
Garbling and keeping sort order is impossible, unless you SELECT garbled(originalname) as garbledname and ORDER BY originalname, and that's as simple as that, you still have the original name at hand when you query. But of course the garbled names will not be in alphabetical order. The difference in garbling as creating random names and en/decrytion or en/decoding is, the latter ways enable to get the original name from the decryption/decoding, while random new names are the only real good way of anonymizing names.

Jockeys code works as is, I don't know what you did wrong about the constant, maybe just try again to copy and paste.

Code:
? scrambling("Olaf",.T.) && HWÏV
? scrambling(scrambling("Olaf",.T.)) && Olaf

It could be simplified using CHRTRAN(lcText,DECRYPTY,ENCRYPTY) for encrypting and CHRTRAN(lcText,ENCRYPTY,DECRYPTY) for decrypting. It is a bit better than Cesar encryption but you provide the translation table with the code, so by definition this rather is an encoding than an encryption. You'd need to keep the ENCRYPTY constant secret to disable decoding, only then it becomes an ancryption method. There still are hints remaining in encryted names, eg as the letter E is most frequent (also in names) you can find what encryption character corresponds to E, etc. Since each character has an encryption character independent from any other, you can decrypt letter by letter. Having many such "encrypted" names you can also find the ENCRYPTY constant, even if the code would not reveal it. This is a big weakness. But Jockey also says so in the comment. I'd specify this is not an ENcryption. You could also argue it's not a scrambling, as it doesn't shuffle letters, any initial "a" always is encoded as "Ï", so it neither simply changes position as in shuffling, nor is the encoded letter always differing, that would require the constants to change. In an encryption the "scrambling" also will depend on a keypair and/or password, also that are differences making this a mere encoding, though not an industry standard encoding as ANSI codepages are or as base64 is. It's just a mapping of original and encoded characters.

Bye, Olaf.
 
Hi Gendev

your question:

I'm trying to use it as a test but I can't get the #DEFINEs to kick in.

Suggest you copy the define's as given in my thread below the code into the code instead of the remark
*!* the constants DECRYPTY and ENCRYPTY are shown here below the code

The only reason I have put them in this code 'outside' the code since when I put them inside a code block here ar TekTips the content seems to be garbled.

Is not clear, please report back.

BTW my code will, I suppose, apply to your users's requirements as stated in your last message to Olaf.

Take care: This is NOT a decryption code in anyway it is a 'garbling' code, or as Olaf says an 'encoding' code, it simply replaces a character with an other as given in de constant ENCRYPTY and changes back as given in the constant DECRYPTY. So you can index.

Regards,

Jockey2
 
>garbage letters are still better for layout/display and less distracting or annoying than a hash value

Could always apply a simple Base64 encoding to turn the hash into garbled characters ... in fact just apply the Base64 encoding to the original text - that should provide sufficient garbling AND vaguely maintain sort order, and it is reversible.
 
It's a bit questionable, if it should be reversible, and if so, then most probably not by a standardization of the encoding, but using real encryption like AES, which needs the knowledge of knowing the necessary password or having the necessary key or certificate and/or the access to it also limited by permissions.

And anonymization would mean, you want to prevent the possibility to know the original name.

As gendev bowed out of this job, I guess we'll never know the exact conditions.


Base64 is indeed just spreading bits, adding in 00 bits every 6 bits of the original bytes, 3 original bytes (24 bits) become 4x6bits+4x2bits = 4 new bytes. It would keep the binary sort order, as the binary code is indeed just shifted.

It only visually hides names. I couldn't decode just in my head, but take a copy of the "garbled" base64 encoding and put it into STRCONV and you'd have the clear text name. Bad idea.

? STRCONV("Olaf",13) && T2xhZg==
? STRCON("T2xhZg==",14) && Olaf

Just because something looks encrypted, it isn't encrypted. Once you see something with mainly just letters and numbers you can simply make the experiment to base64 decode it and are likely successful.

If it was just about visual appearance you could also simply display the hex representation, but I kind of would be able to detect many letters by knowing A=0x41, B=0x42, ... for example. You could also replace all name with * or x. Obfuscation can be much simpler.

Bye, Olaf.
 
Sure; I'm not arguing the pros and cons of proper encryption. Nor suggesting that hashing, or Base64 encoding is a substitute for encryption. Just that they *might* be viable for the OP, who has not as yet really stated whether they simply want obfuscation or not (and that they might be a solution that is within their coding skills, so that they could bow back in)
 
Hi Strongm,

I doubt if you apply a real encryption the requirement quote whereby names will be 'garbled' the same way whenever they occur in the field so that they can sort as normal in the genealogy application. unquote would work, I dont think so.
As Olaf pointed out "Olaf" will always to transformed to "HWÏV"

Regards,

Jockey2
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top