Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Forking - run in parallel

Status
Not open for further replies.

BrianAtWork

Programmer
Apr 12, 2004
148
US
I have an array of files (about 10,000), and I need to do something with each file (search, count the lines, etc).

I want it to run in parallel so it runs a bit faster.

I have made an array (@search_array) where each element in the array contains a portion of the 10,000 files, separated by spaces. Example of the array:
Code:
$search_array[0] would be "file1 file2 file3 file4 file5"
$search_array[1] would be "file6 file7 file8 file9 file10"
$search_array[$#search_array] would be something like "file9996 file9997 file9998 file9999 file10000"

I have a quick function to build this array - the array is 20 elements long with 500 files listed in each element.

I am currently doing this for my parallel piece:

Code:
$SIG{CHLD} = 'IGNORE';
RUNSEARCH:
foreach my $searchstring (@search_array){
    if ($kidpid = fork) {
        next RUNSEARCH;
    }#end if
    defined($kidpid)   or die "cannot fork: $!" ;
    &search($searchstring);
    exit;
}#end foreach
wait();
exit;

I am calling a "search" subroutine to do my search on the list of files - let's say for example, the &search function runs a "grep perl $searchstring". What I have above currently works great, and runs in a fraction of the time it takes to run it serially (it can grep 10,000 files in less than a second). However, I looked at my processes (in Unix) I happened to see about 20 <defunct> processes (I am doing 20 greps in parallel) - which are cleaned up by the time I am able to look at my running processes again.

I just wanted to make sure that I am handling my children processes okay, and not leaving zombies for the OS to kill - I'm not exactly sure what a <defunct> process is, but I have a feeling it means that my children are not being killed properly.

Any suggestions? I've never experimented with forking, etc, so hopefully I won't embarrass my self with this post. I have used Parallel::ForkManager which is pretty nice, but it is not available on our machines.

Thanks in advance!
 
See KevinADC response.

Also, I think there was a waitall or something similar that might help.


Michael Libeson
 
Thank you both - I'm not exactly sure if my system supports $SIG{CHLD} = 'IGNORE'; - I think it does because there is a difference in the behavior of the script when I comment that line out.

I guess it made me a little wary to see all of the <defunct> processes when I looked at my running processes, but because this script runs so fast, I think I am just seeing each of the processes that gets spawned as they are going away.

Thanks again for your help!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top