Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reverse the Caesar Cipher in Excel 2010

Status
Not open for further replies.

luxvivens

Technical User
May 20, 2010
27
CA
Using a formula (see below) identified in the Crystal Reports section of this site, I used that cipher to disguise surgeon name when I upload data to an outside vendor.

Later on, I am able to download all the data sent to the vendor and save it in a CSV for subsequent review. However, now I wish to know the actual name of the surgeon in this file. So my question is, is it possible to reverse the cipher for surgeon name in Excel 2010 so that it appears as the correct name? If yes, any ideas how to do this.

The cipher adds 5 characters to each letter. For example “DOE” becomes “ITJ”. So, if I have ITJ in cell A1, how does one turn it back to Doe. Here is the formula as used in the Crystal Reports 10 if that is helpful:

//The Caesar cipher
//The input string to encrypt
Local StringVar inString :=split({pcmProcedure.mdName} , ",")[1];
Local NumberVar shift := 5;
Local StringVar outString := "";
Local NumberVar i;
For i := 1 To Length(inString) Do
(
Local StringVar inC := inString ;
Local StringVar outC;
Local BooleanVar isChar :=
LowerCase(inC) In "a" To "z";
Local BooleanVar isUCaseChar :=
isChar And (UpperCase (inC) = inC);
inC := LCase(inC);
If isChar Then
(
Local NumberVar offset :=
(Asc(inC) + shift - Asc("a")) Mod
(Asc("z") - Asc("a") + 1);
outC := Chr(offset + Asc("a"));
If isUCaseChar Then outC := UpperCase(outC)
)
Else
outC := inC;
outString := outString + outC
);
outString
 
Gruuuu, thanks. Cute! The last time I successfully guessed an admin password on an unknown machine it took me about 5 minutes including a trip to the lavatory (this site rates the guessed password at 1 minute). An eye-opening calculator...
 
My criticism was "substitution codes using a random number generator with the same seed are really not much more difficult to crack than Caesar cypher", not of one-time pads.

And I think you are confusing some cryptography concepts up. Can you explain why "the alphabet size should be very much smaller than the generator". And there seems to be confusion between integer bit size, cycle length, and internal state. Or how throwing away 56 bits of info is the same as keeping complexity (as a cracker I only have to worry about 256 states for each character; I don't have to worry that there were 72057594037927936 ways of generating each of those states) - I suspect there may be some confusion between block cyphers and stream cyphers.
 
Yes. Let's answer some questions:
(1) Why should the alphabet size be much smaller than the generator. The idea is that if you successfully decode a piece of text, you know the numbers with which I was encoding it. If these are sufficient to tell you the next series of numbers I would use for encoding the next part of the text, you have cracked me. If the alphabet size is the same as the size of the generator, then successfully decoding a single letter will tell you what my next encoding-value would be. That's bad!
(2) I appreciate that in decoding any given letter in my text, you need to worry about only 256 states. In a one-time pad, for each individual letter, there are only 26 states, but the one-time pad is nevertheless a good way to encode things because each individual character is encoded using a value that cannot be deduced from any number of adjacent characters. The code is as good as the one-time pad in its entirety, not just as good as a single letter from the pad. The entire pad has 26^n states, where n is the size of the pad!
(3) Using a random number generator to produce values to xor the text is like using a random number generator to create a one-time pad. You're right to say that each letter has only 256 states. My aim is to ensure that if you successfully decode a few letters, you cannot predict the next part of the "one-time pad" that my random number generator has produced (because the main difference between a one-time pad and a stream of numbers from a generator is that in the stream the numbers are related and not unpredictable). Assuming you know the generator function, that means I have to make sure that you cannot tell what seed the generator currently has.
(4) This is where the extra 56 bits come in. You have one letter. You know 8 bit of my seed. For a 64 bit generator, once you know 8 successive bytes, you know the whole of my seed, and can set the generator running from that point and decode everything else.

I think the confusion between blocks and streams would be cleared up by looking at my original suggestion. I didn't suggest creating a single translation table. I'm sorry, I don't know the terminology of these things very well.

Hey, the best way to clear this up: I am happy to paste pseudo-code for my generator, and paste a piece of encoded text of whatever length you want. I will also disclose a word from the text so you can make a known-text attack. Then you are welcome to have a go. Would that solve the argument? I'll understand if you don't have time for playing about like this, though.

Sorry not to explain myself well.
 
>The idea is that if you successfully decode a piece of text, you know the numbers with which I was encoding it. If these are sufficient to tell you the next series of numbers I would use for encoding the next part of the text, you have cracked me

Indeed ...

>If the alphabet size is the same as the size of the generator, then successfully decoding a single letter will tell you what my next encoding-value would be

... but I continue to fail to understand what you are trying to say here with this bit.

>The code is as good as the one-time pad

No, it is not, because in your original suggestion (the one I disagreed with) you are always using the same seed. Hence you are not using a one time pad. I'm more than happy to agree that if you are using a one time pad then I can't break your encryption.

> I didn't suggest creating a single translation table

That's exactly what you did when talking about using the same seed. At least, as I understood what you were saying.

>a known-text attack

I've been guilty myself in being a little loose with terminology. I was really talking about a chosen-plaintext attack rather than a known plaintext.

>This is where the extra 56 bits come in ...

Ah - seems like you are confusing the internal state with the output of the generator.
 
There are two things going on here.

One is my mistake. In my original post, I suggested starting the generator at a known seed ("possibly derived from a password"). This was mischievious of me, and silly. I wanted simplicity, so I allowed the possibility of using a single seed for ever, which is like using a one-time pad many times! I should have acknowledged that deriving the seed from a password is the ONLY sensible way to do things. This gives the random-number generator a chance to produce a different pad each time.

The second is the internal-state of generator thing. If I use a pseudo-random number generator that has only 8 bits, then it can only generate 256 pads, and each pad will repeat itself every 256 letters. Obviously the 256 pads are merely the same pad shifted, which makes the situation pretty awful.

If I use a 64 bit generator, it can generate 2^64 pads (OK, really it's generating one pad shifted up to 2^64 times, so if we're being strict about things, if our text is 1000 letters long, we should consider that we've got (2^64)/1000 independent pads). If you don't know what pad I'm using, you have a problem. Of course if I keep using the same pad, sooner or later you'll crack it. The critical thing is that I must not let you know the random number seed.

I hope that makes it clearer why I consider that the size of the random number generator really does matter (even in the bad case that I always use the same seed), and the bigger the better.

Just a point of terminology where I think I'm causing confusion. When I said I didn't suggest a single translation table, I meant I didn't suggest a single caesar-style table of just 26 entries, A->B, B->X or whatever, where every single "A" in the original text is encoded to the same letter. This, I think, would have been weak (as proven by the code-word puzzles in newspapers!). I admit I was suggesting a SINGLE pad, but my pad is much longer than 26 entries, and identical letters in the original text do not get encoded as identical letters.

If it's still not clear, what I can do is post a brief example using a stupid 2-bit generator versus a vaguely-sensible 32-bit generator, as extreme cases.
 
>which is like using a one-time pad many times! I should have acknowledged that deriving the seed from a password is the ONLY sensible way to do things

Well, certainly more sensible - but for true secrecy you need a different password (and hence seed) each time. And this of course means that it is difficult to achieve the goal set out in the OP.

>The second is the internal-state of generator thing. If I use a pseudo-random number generator that has only 8 bits, then

Here's where some confusion has crept in, for me. Typically if we describe a PRNG as x-bit, we are talking about the size of the output, not the internal state. You seem to be using 'x-bit' interchangeably, as if they are the same thing (and, as a result, on at least one occassion suggesting that, with a 64bit generator that if I can decode 8 sequential bytes then I've broken your seed, which is just rubbish - but certainly explains why you think that throwing away 56 bits of the output somehow improves things for you)

>Just a point of terminology where I think I'm causing confusion

No, no real confusion on this. The issue with this is if you always use a single seed then your table is always the same - that's the weakness I was getting at.

For this particular case - and the reason I originally commented about this - I don't care how long your pad is. I only need to worry about the first few characters (the OP example only uses 3 characters, presumably the consultant's initials). Your first 3 encryption bytes will always be the same, and if I can insert a known text, then I can determine those three bytes with zero effort (and not care about whether you are using a 64-bit generator with a 72-bit internal state, or an 8-bit generator with a 2-bit internal state; I don't care about your seed, your algorithm or what the next bytes in the sequence might be; I don't even really care whether you are using a 96 character alphabet or one of 256 characters)

Sure, if you use a different seed each time then I'm stuffed. But that isn't the scenario I was commenting on.
 
strongm, hope you don't mind me continuing this thread; I'm learning things and it's making me think.

(1) fully agreed on passwords and using a different seed each time.
(2) Yes, you are quite correct, I am thinking of a random number generator in terms of the size of its internal state, on the grounds that I can always derive smaller values from this, but it's the internal state that defines how long it takes for the sequence of smaller, derivative values to repeat. To me, a 64-bit generator has 64-bits of internal state but can be used to generate anything from 1 bit to 64.

(3) For your other two points, can I contrast two examples, which I'm doing quickly on the back of an envelope, so please forgive mistakes.

Example 1:

My alphabet has only 4 letters, ABCD encoded in binary 0-3. This is obviously 2 bits.

My first pseudo-random generator is a 2-bit generator, and simply adds 1 every time it's called. We'll start with seed "0".

Since we need 2 bits and only have 2 bits, we might as well use the bits we've got without any further ado.

We encode by XORing. The text to be encoded is AAAAAA which means that actually we're XORing a string of zeroes, so the output is exactly the output of the random number generator.

Since this simple generator will output 0,1,2,3,0,1,2,3.... "AAAA" now becomes "ABCDAB"

If you know the first two text characters, you can deduce my seed, and you can decode the rest. In fact you only needed to know the first character.

Example 2:
exactly the same scenario, but I am now using a generator that is internally 4-bits long. It still adds 1 each time. I now have 4 bits but need a 2bit number, so I choose to generate this by xoring bits 0 and 3 for bit 1 of output, and bits 1 and 2 for bit 0 of output. The output numbers are now:
0, 2, 1, 3, 1, 3, 0, 2, 2, 0, 3, 1, 3, 1, 2, 0 (then repeat)

This means that my text is now
ACBDAC
If you know the first two characters should be AA, you know that I encoded with 0, 2. The critical thing is that this sequence actually occurs twice in the possible output of my generator (remember, I have not disclosed my key! You have to find that from your known text attack).

You don't know at which occurrence of 0,2 I started, so my encoding sequence could proceed 0,2,1,3,1,3 or it could be 0,2,2,0,3,1. This means you now have two possible options for how the original text could have continued. My actual key is significant knowledge for you!

Scaled up, if I have a generator using a much larger number of internal states, it will have many more repetitions of a limited number of output-encoding-characters within its total sequence. This means that if you successfully find the first few encoding characters, you will have many more possible continuations to try out. The important thing for me is to make sure that the number of possible continuations from a known sequence of decoding-bytes is very very large. That's why I consider the internal size of the random number generator to be critical.

With the pad analogy, I'm assuming you know the algorithm, but not the seed (the original post didn't intend to disclose either, but in general encoding algorithms can hardly be secret. We all have Word, but we don't have someone else's password). That means that you know the pad my algorithm creates, but you don't know where in the pad I started (that's what the seed is doing).

End of examples!

You raised a very interesting point about whether it is always possible to find the seed from a generator that is 64-bits internally given 8 bytes of encoding-values derived from the generator. Technically, I think you're right: it's quite possible to design a generator such that 8 successive encoding bytes don't allow you to determine the seed. For example, in my 4-bit generator, if I had derived the 2-bit output simply by taking the lowest two internal bits, a string of output-values from the generator will never tell me its internal state (since the upper two bits are never used anyway).

I need time to think about this. My feeling is that if I rig a generator that internally has 8 bytes such that two different internal states can lead to the same 8 successive bytes of output, then I am obliged to introduce some element of predictable periodicity into the output, which I don't want to do. It's certainly true that if two different internal states can lead to the same successive 8 encoding bytes, then there will be other sets of 8 encoding bytes that can never be generated, and that's a bad thing in any encoding system. We really want generators whose range of generated sequences is unlimited, and which show as little pattern as possible. So I sort-of doubt that the situation you describe would occur in the sort of algorithm I'd like to choose to make a real encoding system.
 
got it: forgetting the issue of 8 bytes, what I'm really claiming is that 64 successive bits of output from a random generator should be enough to reveal its seed if it's internally got 64 bits determining its state.

There are obviously 2^64 possible sequences of 64 bits, and a 64-bit generator has 2^64 possible seeds, so you can devise a generator with 64 bits that is just good enough to produce any possible 64-bit sequence merely by choosing the right seed.

There is a perfect 1:1 map of seed-values to 64-bit sequences.

If you state that a 64 bit sequence is NOT enough to determine the seed, then there must be at least one 64-bit sequence that can be derived from at least two seeds (meaning that after I've seen this sequence I still don't know which seed is correct).

If this is the case, then there must be at least one corresponding 64-bit sequence that can't be generated by any seed.

Imagine now that the generator has been seeded to produce nearly one of the sequences that can't be generated (say a sequence that differs from the impossible sequence in only one bit). As this sequence is generated, we are walking down a binary tree, turning left or right according to whether the generator produces 1 or 0. There must come a point where the forbidden twig is one of the choices. At this stage, we know which way we cannot go. We know the value of the next "random" bit the generator will generate! (I think we even know it without knowing the seed value!).

This means that we can only get a generator whose seed is not determined by 64 bits if we sacrifice its ability to produce an unpredictable result at every turn.

I'd argue that if we can guess one of the 64 bits of our sequence without actually asking the generator, then I am ethically entitled to ask for another unpredictable bit at the end of the 64-bit sequence! So can I re-phrase what I wrote? Given 8-bytes worth of independent information that I didn't know anyway, it should be possible to deduce the seed of any 64-bit generator.
 
>it's the internal state that defines how long it takes for the sequence [] to repeat

Yup. More precisely it is sets a minimum guaranteed limit for the period of the number sequence. Depending on the seed and the algorithm the period may be higher, perhaps significantly higher.

>My first pseudo-random generator is a 2-bit generator, and simply adds 1 every time it's called

This is not a random number generator in any sense. So you can't really use it as a counter-example

>we might as well use the bits we've got

Again, a decent random number generator will not use it's current state as its output. If you insist on doing so then you have dramatically weakened your encryption (as you so ably demonstrate)

>Example 2
Much better in principal, as you are beginning to seperate state from output (but again not an actual random number generator)

> in general encoding algorithms can hardly be secret

Quite so. Most in use, such as AES, are well documented. We know exactly how they work.

>My feeling is that if I rig a generator that internally has 8 bytes such that two different internal states can lead to the same 8 successive bytes of output, then I am obliged to introduce some element of predictable periodicity into the output

It is exacty this reason that makes many home grown random number generators and encryption methods fairly useless - they are indeed predictable and hence easily crackable. Most of the algorithms in real use have been very carefully designed to avoid such shortcomings whilst retaining statistically valid random output
 
by the way, entirely agreed that we would never use all the bits we have! I only chose the very small generator to illustrate a point. Using all the bits would make the next number totally predictable.

I always feel that pseudorandom numbers are a weird concept. I know people have put lots of time into developing statistical tests, but really, how can one, with a straight face, define randomness in anything that is entirely non-random?
 
Before commenting further (no, really), I need to establish something that is still unclear to me - what you mean by a 64-bit generator.

I'm happy to accept that you mean that it has an internal 64 bit state - but what do you thin you mean by that.

In other words, if my number generator (I won't call this particualr example a random number generator, since it is not) generates the next number in the sequence by adding the current number to an internal constant (or, say, to the previous number generated), and each of those numbers is 64-bit, then is that a 64bit state or a 128bit state? Or, if I am adding an internal constant, the previous and the current number to generate the next number and these are all 64bit, is this a 64bit state or a 196bit state (or perhaps even 128bit)?
 
I mean that if I look at that generator at some particular moment when it's not being executed, it can exist in 2^64 states. I don't comment on any transitory arithmetic being carried out with higher precision because this can't improve on the 2^64 maximum period of repeat.
 
I rather suspected that you thought this. and unfortunately you have an ongoing misunderrstanding that is influencing your conclusions. i'm in a pub on a mobile devive at the moment so unable to comment in more detail; I'll be back ...
 
>I don't comment on any transitory arithmetic being carried out with higher precision because this can't improve on the 2^64 maximum period of repeat

Well, yes it can. Absolutely it can. And I've tried to point this out to you several times in this thread. The Mersenne Twister algorithm (MT19937), for example, is a 32-bit algorithm with a period of 219937 ? 1 cycles. It uses no higher accuracy than 32bits

Here's the code for a (pretty poor since I have not selected optimum values) 8-bit accuracy algorithm - a lagged Fibonacci generator - which in theory has a period of at least 896 (rather than the anticipated 256 you might be expect of an 8bit (as defined and understood by you) generator.

Private fibk As New Collection
'Private thingy(0 To 255) As Integer

Public Sub Example()
Dim MyNumber As Long
Dim lp As Long
'Dim thingy(0 To 255) As Integer 'Uncomment this if you want to see how various values are duplicated

' Ok, seed our For lp = 1 To 4
For lp = 1 To 4 ' holding more info than we need to
fibk.Add CLng(Rnd() * 2 ^ 8)
Next

MyNumber = Fib(23575) '17)
For lp = 1 To 896
Debug.Print MyNumber
MyNumber = Fib(MyNumber)
'thingy(MyNumber) = thingy(MyNumber) + 1 ' see comment above for purpose of thingy
Next
' Stop 'so you can examine thingy if you want
End Sub

' Simple unoptimised Lagged Fibonacci generator
' Using j=1 and k=3
Public Function Fib(LastNumber As Long) As Long
Static OriginalNumber As Long

Fib = (fibk(3) + fibk(1)) Mod 2 ^ 8

fibk.Remove (1)
fibk.Add LastNumber Mod 2 ^ 8
OriginalNumber = LastNumber
End Function

 
I think you're cheating (grin)! You're calling a generator which has 8 bits of internal state but also giving it a parameter containing another 8 bits, and it won't work if you call it without the extra parameter. That means that it has more than 256 states.

As for the Mersenne Twister, looking at its wikipedia entry:
it starts off by creating a vast array (0..623) of 32 bit numbers, which means theoretically it can have more than 2^32 states. It can actually have (2^32)^624) states, which is phenomenal.

I think you've made a typo on the period length, which is (2^19937)-1, which is just slightly smaller than (2^32)^624.

Also, please note that wikipedia maintains that 624 successive 32 bit values from the 32-bit Mersenne twister are sufficient to predict all future results. This is just what I said earlier: once you have seen as many independent bits from a generator as it has in its internal memory, you know everything about it for ever.

In state terms, each state can only lead to one other state (computers being thoroughly deterministic organisms), so if you have only 2^32 states, you cannot produce a cycle of states longer than 2^32.

If you define the bittedness of a generator by the number of bits used in parallel in the largest arithmetic or logical operation in the algorithm, you have an utterly meaningless definition. I could process a 64-bit number (stored as 8 bytes) in 8 one-byte operations, just like we used to 25 years ago with 6502 processors, and my generator would be an 8-bit generator. I could implement exactly the same algorithm today, with exactly the same output, but on a 64-bit processor, and it would suddenly be a 64-bit generator. By defining the bittedness of a generator in terms of the amount of states it can have, given the static memory it needs (including any parameters the caller is expected to remember until next time the generator is called) I have a definition that means something: it tells me the maximum cycle time (not the actual cycle time, you're quite right!), and allows me to do some arithmetic on how it behaves.

(but thanks for the ongoing debate!)
 
>624 successive 32 bit values from the 32-bit Mersenne twister are sufficient to predict all future results

Yes, that is a weakness of the original basic Mersenne Twister (as the article sates "in its native form is not suitable for cryptography"). But note that the important thing here is that it is NOT simply 32 bits that are required (as per your argument). It is somewhat more. And there are more cryptographically secure versions of Mersenne that require much bigger samples before they are predictable.

> a typo on the period

Yep. Blame the pub.

>cheating

Not at all. It is why I asked you the question(s) that I did. Let's look at your criticism: "giving it a parameter containing another 8 bits"[sup]*[/sup]. You said it didn't matter if we introduced additional parameters by saying that the transitory maths didn't matter. And if that is the case how and where I introduce the parameters is irrelavant. On the other hand, if you are now saying they are important, then your are granting that to switch between states in your own so-called 64-bit generator involves dome additional bits from somewhere - and is therefore NOT a 64-bit generator ...



[sup]*[/sup]Note, by the way, that if your objection to the LFG is simply the use of an external parameter (presumably becasue it looks like I'm introducing an extra number) then I can make a miniscule chamge that eliminates that parameter, but maintains exactly the same algorithm.

Private fibk As New Collection
'Private thingy(0 To 255) As Integer

Public Sub Example()
Dim MyNumber As Long
Dim lp As Long
'Dim thingy(0 To 255) As Integer 'Uncomment this if you want to see how various values are duplicated

' Ok, seed our For lp = 1 To 4
For lp = 1 To 4 ' holding more info than we need to
fibk.Add CLng(Rnd() * 2 ^ 8)
Next

MyNumber = Fib
For lp = 1 To 896
Debug.Print MyNumber
MyNumber = Fib
'thingy(MyNumber) = thingy(MyNumber) + 1 ' see comment above for purpose of thingy
Next
' Stop 'so you can examine thingy if you want
End Sub

' Simple unoptimised Lagged Fibonacci generator
' Using j=1 and k=3
Public Function Fib() As Long
Fib = (fibk(3) + fibk(1)) Mod 2 ^ 8
fibk.Remove (1)
fibk.Add Fib
End Function


 
sorrreeee! Hope I'm not causing too much stress down the pub.

OK, but honestly I think this is largely a matter of terminology, and I'll admit I've been loose with terminology. Of course a Mersenne-style thing can be expanded to any size you want for better cryptography, provided someone is clever enough to design it. There will always be the limitation, though, that with n bits of storage being used by an algorithm, however that storage is arranged, the cycle can't be longer than 2^n, and my real point was that the Mersenne example conforms to this limitation.

My 64-bit generator honestly needed no further storage than its 64 bits. I assumed a generator similar to the old ANSI 32-bit-internally, 16-bit-value-returning example:
a = a*1103515245+12345 <- a is 32 bits long, this is the generator
return (a/65536)%32768 <- this is the part that cuts 32 bits to 16 bit so-called random number

but hey, I'm never going to go into cryptography. Your pub sounds more fun.
 
>I could process a 64-bit number (stored as 8 bytes) in 8 one-byte operations, just like we used to 25 years ago with 6502 processors, and my generator would be an 8-bit generator

This is precisely why I asked what you meant by a 64bit generator.

What you are describing is simply a state machine that outputs each and every possible state once and once only during it's cycle), not a random number generator, and - assuming I know the state machine's algorithm - once I know the current state is entirely predictable. You yourself make this case. It is just that it has taken me this long to realise (even though I hinted at it earlier) that, of course, you are not describing a random number generator.

Here's what NIST has to say in their "A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications" document:
NIST said:
Random and pseudorandom numbers generated for cryptographic applications should be unpredictable.
In the case of PRNGs, if the seed is unknown, the next output number in the sequence should be
unpredictable in spite of any knowledge of previous random numbers in the sequence
. This property is
known as forward unpredictability. It should also not be feasible to determine the seed from knowledge
of any generated values
(i.e., backward unpredictability is also required). No correlation between a seed
and any value generated from that seed should be evident
; each element of the sequence should appear to
be the outcome of an independent random event whose probability is 1/2
.
To ensure forward unpredictability, care must be exercised in obtaining seeds. The values produced by a
PRNG are completely predictable if the seed and generation algorithm are known. Since in many cases
the generation algorithm is publicly available, the seed must be kept secret and should not be derivable
from the pseudorandom sequence that it produces. In addition, the seed itself must be unpredictable
 
Two questions:

(1) if the next number generated by a pseudo random number generator doesn't depend on the generator's current state, what does it depend on?

(2) How do NIST propose to use a genuine random number generator (e.g. dice) to encrypt and later decrypt something?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top