Good code. You could try to optimize it a little more. First thing where to look are jumps and calls, and most of all, inner loops. I bet you can optimize them if you take a look at some optimization articles, such as Agner Fog's Pentium Optimisation manuals.
You'll find them very useful...