Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Optimum File Buffer Size / Performance 1

Status
Not open for further replies.

Glenn9999

Programmer
Jun 19, 2004
2,312
US
One thing I've recently come across in studying algorithms and trying them against large files is that the assumption I use of a 64K buffer is probably inconsistent and outdated (it came from testing on an older system a few years ago).

You have differing devices which are capable of differing rates, which makes it more complex. Then I'm thinking that the hard drives today (as well as some other kinds of drives) can be driven a lot harder than 64K at a time with a performance advantage. Remember that you can have a performance hit if you read too little of a file at one time (for example, reading a byte at a time is VERY slow).

Then add to that the fact that you can do memory-mapped files in Windows as well as do overlapped I/O (async I/O, read file process it in a separate thread leaving the main thread to read the next block), and it makes me wonder if it would be suitable to rethink a performance strategy.

Any thoughts on how to approach such a problem so optimum performance from devices can be had no matter what?

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
I've always had the assumption that you can't go wrong if you have a very large buffer. It may use more memory (not an issue these days), but all the different layers from Windows to the physical device itself will simply break a requested buffer read into multiple smaller reads that are optimal for the layer it's requesting from.
 
I've always had the assumption that you can't go wrong if you have a very large buffer.

..physical device itself will simply break a requested buffer read into multiple smaller reads that are optimal for the layer it's requesting from.

Actually, that's a faulty assumption I've found from testing. If you graph out buffer size on a particular drive compared with performance when you have the file caching enabled, it graphs out to a steep parabolic curve. Which means, generally, that finding the bottom end of that curve is pretty incumbent for good performance in reading from the device, even though there does seem to be a slight bit of flexability.

In testing some more, it does seem that sticking to 64KB for a buffer size will serve me well, though it does seem you can go higher (512KB in some cases) without impact.

The file mapping helps on random access files perhaps the most, but not too much for whole sequential access (it seems).

As always, any constructive thoughts are welcome. Thanks!

It is not possible for anyone to acknowledge truth when their salary depends on them not doing it.
 
Glenn9999 said:
Actually, that's a faulty assumption I've found from testing.

Good to know. Thanks for taking the time to test these old myths.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top