Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to process file in parallel in a script?

Status
Not open for further replies.

mackpei

Programmer
Jun 6, 2000
27
DE
Hi there,

I have a script processing a lot of files in a directory.

If the files are processed in sequence, it would take much time. So one idea is to create several child processes, and each of them will process a part of the files.

My question is: Is there any easy way to group the files (their sizes can differ very much, and their format is binary), so that each child process gets similar data amount to process? This could then result in a maximum system workload and data throughput.

Thanks!
Mack
 
Sort them by file size and process the largest first to get the most efficient throughput, e.g.

[tt]for file in `ls -l *.dat | sort -rn -k 5,5 | awk '{print $NF}'`
do
...[/tt]

Annihilannic.
 
Before you go "optimising", do you know where the bottleneck really is?

If you've already maxed out the disk transfer rate and the handling process is pretty much idle, then spawning multiple processes isn't going to help that much.

Likewise, if the process it at 100% most of the time and the disk is doing nothing, this too isn't something which is going to benefit from multiple instances.

--
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top