Forking - run in parallel

BrianAtWork · Dec 13, 2006

I have an array of files (about 10,000), and I need to do something with each file (search, count the lines, etc).

I want it to run in parallel so it runs a bit faster.

I have made an array (@search_array) where each element in the array contains a portion of the 10,000 files, separated by spaces. Example of the array:

Code:

$search_array[0] would be "file1 file2 file3 file4 file5"
$search_array[1] would be "file6 file7 file8 file9 file10"
$search_array[$#search_array] would be something like "file9996 file9997 file9998 file9999 file10000"

I have a quick function to build this array - the array is 20 elements long with 500 files listed in each element.

I am currently doing this for my parallel piece:

Code:

$SIG{CHLD} = 'IGNORE';
RUNSEARCH:
foreach my $searchstring (@search_array){
    if ($kidpid = fork) {
        next RUNSEARCH;
    }#end if
    defined($kidpid)   or die "cannot fork: $!" ;
    &search($searchstring);
    exit;
}#end foreach
wait();
exit;

I am calling a "search" subroutine to do my search on the list of files - let's say for example, the &search function runs a "grep perl $searchstring". What I have above currently works great, and runs in a fraction of the time it takes to run it serially (it can grep 10,000 files in less than a second). However, I looked at my processes (in Unix) I happened to see about 20 <defunct> processes (I am doing 20 greps in parallel) - which are cleaned up by the time I am able to look at my running processes again.

I just wanted to make sure that I am handling my children processes okay, and not leaving zombies for the OS to kill - I'm not exactly sure what a <defunct> process is, but I have a feeling it means that my children are not being killed properly.

Any suggestions? I've never experimented with forking, etc, so hopefully I won't embarrass my self with this post. I have used Parallel::ForkManager which is pretty nice, but it is not available on our machines.

Thanks in advance!

KevinADC · Dec 13, 2006

if your system supports this:

$SIG{CHLD} = 'IGNORE';

then there sould be no zombies.

http://perldoc.perl.org/perlipc.html

- Kevin, perl coder unexceptional!

mlibeson · Dec 13, 2006

See KevinADC response.

Also, I think there was a waitall or something similar that might help.

Michael Libeson

BrianAtWork · Dec 14, 2006

Thank you both - I'm not exactly sure if my system supports $SIG{CHLD} = 'IGNORE'; - I think it does because there is a difference in the behavior of the script when I comment that line out.

I guess it made me a little wary to see all of the <defunct> processes when I looked at my running processes, but because this script runs so fast, I think I am just seeing each of the processes that gets spawned as they are going away.

Thanks again for your help!

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Forking - run in parallel

BrianAtWork

Programmer

KevinADC

Technical User

mlibeson

Programmer

BrianAtWork

Programmer

Similar threads

Part and Inventory Search

Sponsor