...however when i have 50K files in the input, it takes a very very long time to process. the files are all 4KB size...
"
I noted that on our machine, 1 cpu (3.4 ghz) 4gb ram processing 1-kb files took longer multi-threading than they did sequentially.
I used a grinder script and processed 25 cylcles with 100 files per cycle, 1 second between cycles. I set it up to run multi-thread and then sequential. The system processed the sequential series faster.
Problem is shared CPU and memory. The fact that you are using memory for workspace should help but it still takes resources to swap between processes.
Probably preaching to the choir so bear with me. As you know, Windows doesn't really process two items at exactly the same time. It swaps back and forth between files/processes. This happens pretty fast so you can't really tell much diff between multi-threading several small files but at the scale you're talking about the additional milisecond here and there adds up. Then, add to that, if you consume all of your physical memory the machine starts using virtual memory (disk). Things will slow down even more.
Also, keep in mind that your input files are only 4kb each but depending on map design you could be consuming a much larger chunk of memory. I've seen maps that have consumed more than 4 times the input file size(s) in resources. Are you reading other inputs? Connecting to DBs? and so on?
Is 50K files a realistic test for your environment?
eyetry,
ps: sorry, don't really feel I'm being much help here.