Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

File Size on Disk

Status
Not open for further replies.

pinnochio

Technical User
Nov 15, 2002
14
0
0
US
More and more I notice a big difference between
size of files and size on disk(usually bigger). Why
this everpresent discrepancy(both 98 and ME) and
which file size should I trust???

-P
 

Files are stored on the disk in what is called called clusters (a group of disk sectors). Size on disk refers to the amount of cluster allocation a file is taking up, compared to file size which is an actual byte count. The lowest cluster size for FAT32 is 1, so if the actual file only needs a small portion of the cluster, the size on disk for that file will reflect the entire cluster as being used. This is why when you check the properties tab of a file (using Microsoft Windows), you will usually see the size on disk size is larger than the file size.
 
The size of the file (bytes), is the true size of the file itself. Depending on the size of the clusters (2k, 4k, etc.) the size on disk will show differently. For instance, if you have 4k clusters, a 512 byte file will "take up" a full 4k cluster. Due to DOS's habit of putting bytes anywhere it wants, the afore mentioned 512 byte file may inhabit part of many clusters. This is the purpose of "Defragging", it puts the bytes back together in one place. (This is a simplistic explaination of what goes on in the background of a hard drive, actually it's much more complex!)
 
Or you can liken file storage on the hard drive as being like a wall of small lockers. If you save a large file, it will occupy a certain number of lockers, filling several plus a bit of the last one.

If, however, you save a very small file, such as an internet cookie, you only use a very small bit of one locker, but the whole locker has to be recorded as used, as no two files can share the same locker. (This would be cross linked files and lead to corruption.)

This difference between the size of the actual file - 100 bytes, say, and the size of the locker it has used up - 4,096 bytes for example, is known as 'slack space'. The smaller the file, the worse the waste becomes.

The best way to minimise this slack space is to choose a cluster size (locker) as small as possible if the files to be saved are generally small in size.

Just for interest's sake, right click your COOKIES folder and look at the difference between the total size of the files themselves, and the room on the hard drive it has acually taken up.

Andy.

Regards, Andy.
**************************************
My pathetic attempts at learning HTML can be laughed at here:
 
as a worst case example, FAT16 for a 2.1 gb partition uses 32k cluster sizes. So batch files for example, which are small text files mostly, waste probably 31k per file.

This is a limitation of the cluster addressing scheme which was partially resolved with 98 and greater with FAT32 and NTFS.

Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top