Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PKZip Std Encryption Vulnerability 1

Status
Not open for further replies.
Apr 13, 2001
4,475
US
I hope I chose the right forum.

I am trying to determine whether or not I can use Zip encryption for an application I have. My data is sensitive, but hardly at the "military secret" level. Please don't just blow this off right away, I realize what the "common sense" feeling is about Zip encryption.

I've done a lot of research on this (the digging through books, newsgroups, and the web type research, i.e. reading) and I have found a lot of bluster, some rumor, and a few facts.

Bluster:

It all comes down to this I'm afraid.

B1.) Standard Zip archive (PKZip) encryption is insecure. Use PGP (etc., etc.) instead.

Rumor (Fact?):

R1.) Some common file formats such as Word and WordPerfect documents contain fairly predictable header information. This can be used to create useful snippets of "plain compressed text." This generic compressed text is good enough to permit technique (F4.) below to be used if an archive contains documents of those types even without an unencrypted copy of the archived documents to work with.

Facts:

F1.) There are dictionary attacks against Zip passwords that are effective in cases of poorly-chosen (short, simple words, simple words concatenated) passwords.

F2.) There are brute-force attacks against Zip passwords that are effective in cases where passwords are short, were created from small alphabets (i.e. only lowercase letters), or are comprised of a known small range of lengths (i.e. password is known to be 7 to 10 characters long).

F3.) There is a special attack against WinZip-created encrypted Zip archives containing 4 or more files.

F4.) There are two (possibly really just one) "known plain-text" attacks against encrypted Zip archives that require somewhere from 10 to 20 bytes of the compressed, not yet encrypted text from a file in the archive.

F5.) There are several non-commerical Zip crackers floating around. Most specialize in technique (F1.) or (F2.) while a few implement (F3.) or (F4.) above.

F6.) There are several commercially available Zip crackers on the market. Some of these start by trying (F3.), then possibly to (F4.) assisted by (R1.). These may perhaps employ straight (F4.) if provided with appropriate plain-text file input. They also can use (F1.) often accepting custom dictionaries or able to use multiple dictionaries from the vendor. I believe the last resort is (F2.), sometimes allowing password-length ranges or alphabets to be specified.


In general the (F6.) products seem to have easy to use "auto" modes that take stabs at breaking the Zip file starting with quicker-when-they-work methods and then trying the progressively slower techniques. They probably all have "manual mode" operation allowing more tailoring of parameters and more input to the process.

I personally I have no experience with these products, but have done what research I can by reading product info and reviews, as well as scouring the sci.crypt newsgroup and others. I did DL and test a demo version of one product, but it was pretty sad and got noplace after 72 hours on a PII 350 machine running Win2K.

My Scenario:

So here is what I want to do. I have about 100 data files. These are ASCII text data, fixed-field and CSV type stuff. They compress pretty well, but they are large. All 100 after Zip compression are about 450MB in total, though individual files range from 200KB to 100MB each, compressed. No special document format, just text.

Each file is to be Zipped with encryption separately. WinZip will not be used, probably InfoZip's stuff based upon PKZip 2.04G and later but nothing "recent" from PKZip. Each archive will use a unique password. The passwords will be formed using ASCII characters from the standard printable set: 95 different symbols. Passwords will be generated randomly in lengths varying from 10 to 20 characters each.

The encrypted archives will be placed on an Internet-visible web site. The web site would require a user logon with user ID and password (not the Zip password). SSL would be used to cover the Basic Authentication of the logon. Each authorized user will be permitted to download only the archives they are supposed to have (generally only 1 archive). Downloading will also be via SSL.

Passwords (logon and encyption) will not be on the site, but distributed separately via snail mail or fax. No plain-text of the data will be on the web server or on other machines in the DMZ.

The user community will be comprised of anything from knowledgable computer users to clerical staff. The target platforms may vary from Macs or Windows PCs to Unix servers and mainframes. They are all outside of my organization. These are the reasons to choose Zip: ease of use, ubiquity, universality.

My Question:

Just how vulnerable is this data to interception and decryption?

I'm not concerned here with somebody getting access to machines on my internal network that might have some of the plain-text. No, I'm not blowing this off, I realize internal physical and logical security is very relevant to this application. I'm asking about Internet users getting at the stuff here though.

Clearly the web server security comes into play, but that was intended as simply another easy-to-add layer of security. For all intents and purposes I am asking about the Zip encryption itself.

I'd welcome any informed opinions. I might even consider sending someone a sample of dummy data "protected" as described if they want to invest the effort to show me how vulernable the scheme is.

I'd rather not wade through a lot of additional FUD as in (B1.) up above though. My own opinion right now is that commercial products rely on the WinZip vulnerability, possession of plain-text, and dict/brute attacks against poor passwords. A lot has been made of the fact that standard PKZip encryption was not export-resticted by the U.S. government. I'm not even certain that was true. In any case I assume that the reason it might not have been seen as a "threat" has more to do with typically poor password choices that can leave it vulnerable, combined with the "common knowledge" that it is insecure.
 
Something you may not have read

A couple of basic questions which you need to ask yourself
- who is the enemy, and what resources do they have?
- what is the cost to you if they manage to break the encryption?

Based on the available evidence, you seem to have most of the bases covered.

Your password scheme defeats dictionary attacks, and makes brute force attacks very difficult. My only concern would be to make sure that the passwords really are random - just in case a poor pseudo random password generator leaks information (or occasionally produces a very non-random password)
This addresses F1 and F2

> WinZip will not be used
This addresses F3

> F4.) There are two (possibly really just one) "known plain-text" attacks against encrypted Zip archives
As far as I know, this works best (or only works at all) if the known plaintext is at the start of the archive. I would counter this as follows:
Code:
   pkzip file.zip random.dat datafile.txt
random.dat is a random length file (say upto 1K in length) containing random data (called a 'salt'). This changes for every single ZIP file you create (both in content and length). This should ensure that any "known text" which might be guessed in your text file is nowhere near the start of, or guessable offset into the archive.
You would just instruct your clients to ignore that file.

One more step I would introduce is periodically changing all the passwords. In the extreme, this would be use each password once only.

> A lot has been made of the fact that standard PKZip encryption was not export-resticted by the U.S. government.
Without more research, my guess would be because it is vulnerable to the "known plaintext" attach, and is therefore considered a weak algorithm.

> Just how vulnerable is this data to interception and decryption?
This part of the process seems pretty good. The weak link for me is the password management at the client sites. Careless management of the passwords, or the files after they have been extracted from the ZIP will undo everything here.

Do you have a good source of random data from which to generate passwords and random 'salt' files?

Just my opinions...
 
Thank you for taking the time to write such a thoughtful response.


A couple of basic questions which you need to ask yourself
- who is the enemy, and what resources do they have?
- what is the cost to you if they manage to break the encryption?

These are excellent points.

The answer to the first is "less than a national government, but potentially as much as a large corporation." Scary, hmm? Since a legitimate organization should not be appropriating and using the information in these files I suppose one possibility would be to salt the data with "fingerprints" (bogus data made for the purpose). This might bolster legal recourse should the need arise.

The second is tougher to answer. There is no financial information in these files, nor any trade secrets to be lost. There is no question somebody's head would roll though if there were a major exposure. Therein lies the advantage of using another protection system "blessed" by some recognized authority for use with data of this particular sensitivity - which standard Zip encryption almost certainly is not.

One thing weighing in favor of Zip encryption though is its simplicity of use at the client end. More complex-to-operate schemes might prove unusable considering the audience, leading to the use of no encryption at all. At best something more complex to use might lead to more parties being given access to the data in attempts to get it decrypted for use.


One more step I would introduce is periodically changing all the passwords. In the extreme, this would be use each password once only.

These files are created for distribution once per year. Each year new passwords would be created, and a unique password used for each archive.


> Just how vulnerable is this data to interception and decryption?
This part of the process seems pretty good. The weak link for me is the password management at the client sites. Careless management of the passwords, or the files after they have been extracted from the ZIP will undo everything here.

I agree. The client organizations are however bound by regulations similar (for the most part identical) to those my own organization is bound by. Once the data is in their hands information security becomes their responsibility. Not a cop-out, just a fact. In reality I am just concerned about the "transport" of the data from here to there, a luxury those requiring encryption don't always have.


> F4.) There are two (possibly really just one) "known plain-text" attacks against encrypted Zip archives
As far as I know, this works best (or only works at all) if the known plaintext is at the start of the archive. I would counter this as follows:
Code:
   pkzip file.zip random.dat datafile.txt
random.dat is a random length file (say upto 1K in length) containing random data (called a 'salt') ...

This seems to make sense intuitively. I will strongly consider it.


Do you have a good source of random data from which to generate passwords and random 'salt' files?

No. I have a prototype password generation algorithm to generate "random" strings of "randomly varying" length, based upon common pseudo-random-number intrinsic functions found in most programming environments. They don't "look bad" but I understand how little inspecting a few hundred output samples by eye means. Any suggestions would be appreciated.


While the business owner at my end is dictating web-based distribution, I have strong reservations about it. My own feeling is that distributing the data on CDs after compression/encryption makes much more sense. The data requires timely delivery in most cases, but it only changes once a year. In most cases there is a two-week delivery window available. In a few cases a large number of the files are required later in the year.

My own feeling would be to eliminate the web distribution and use only CDs. Barring that, keep the encrypted data on the server only for a month or less and use CDs for any subsequent distribution required. As it is we will have to track web logons and downloads in order to detect unauthorized access - additional overhead incurred with web distribution.


I appreciate the links. I've seen several on the "snake oil" cryptography merchants. ;-)

Thank you once again for your time and effort.
 
> The data requires timely delivery in most cases, but it only changes once a year.
> In most cases there is a two-week delivery window available.
I'd strongly consider the distribution on CD option if these are the time frames involved. Producing a CD and posting it is only slightly more complex than posting the password.
Just don't send the password in the same envelope :)

At this point, it becomes the same problem as the distribute the credit card and PIN problem.

Plus you get to wrap your CD in tamper-proof packaging for added security. At least then you know someone has peeked at the data - something which will always remain a doubt for electronic distribution.

Depends on the number of clients you need to send to
<100 - CDs looking like a better option
>1000 - Web looking like a better option

> As it is we will have to track web logons and downloads in order to detect unauthorized access
Another line of defence would be to set up your server to only accept connections from the pre-determined IPs of your clients.

> based upon common pseudo-random-number intrinsic functions found in most programming environments
These are usually very weak from a cryptographic point of view. The algorithm for generating the pseudo random numbers is very simple. Guessing other peoples passwords from looking at your own becomes distinctly feasable.

Ideally, you would want to use something which is based in hardware. An example would be to use your windows sound recorder and make a recording WITHOUT a microphone plugged in (thus recording noise). This is then compressed and the first 100 bytes or so discarded (the ZIP and WAV file headers). The result should be a nice source of random data.
This site lists some places where to get hardware number generators from

If you need a software generator, then one of the ones listed at the end of this page should be considered
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top