BrianAtWork
Programmer
I have an array of files (about 10,000), and I need to do something with each file (search, count the lines, etc).
I want it to run in parallel so it runs a bit faster.
I have made an array (@search_array) where each element in the array contains a portion of the 10,000 files, separated by spaces. Example of the array:
I have a quick function to build this array - the array is 20 elements long with 500 files listed in each element.
I am currently doing this for my parallel piece:
I am calling a "search" subroutine to do my search on the list of files - let's say for example, the &search function runs a "grep perl $searchstring". What I have above currently works great, and runs in a fraction of the time it takes to run it serially (it can grep 10,000 files in less than a second). However, I looked at my processes (in Unix) I happened to see about 20 <defunct> processes (I am doing 20 greps in parallel) - which are cleaned up by the time I am able to look at my running processes again.
I just wanted to make sure that I am handling my children processes okay, and not leaving zombies for the OS to kill - I'm not exactly sure what a <defunct> process is, but I have a feeling it means that my children are not being killed properly.
Any suggestions? I've never experimented with forking, etc, so hopefully I won't embarrass my self with this post. I have used Parallel::ForkManager which is pretty nice, but it is not available on our machines.
Thanks in advance!
I want it to run in parallel so it runs a bit faster.
I have made an array (@search_array) where each element in the array contains a portion of the 10,000 files, separated by spaces. Example of the array:
Code:
$search_array[0] would be "file1 file2 file3 file4 file5"
$search_array[1] would be "file6 file7 file8 file9 file10"
$search_array[$#search_array] would be something like "file9996 file9997 file9998 file9999 file10000"
I have a quick function to build this array - the array is 20 elements long with 500 files listed in each element.
I am currently doing this for my parallel piece:
Code:
$SIG{CHLD} = 'IGNORE';
RUNSEARCH:
foreach my $searchstring (@search_array){
if ($kidpid = fork) {
next RUNSEARCH;
}#end if
defined($kidpid) or die "cannot fork: $!" ;
&search($searchstring);
exit;
}#end foreach
wait();
exit;
I am calling a "search" subroutine to do my search on the list of files - let's say for example, the &search function runs a "grep perl $searchstring". What I have above currently works great, and runs in a fraction of the time it takes to run it serially (it can grep 10,000 files in less than a second). However, I looked at my processes (in Unix) I happened to see about 20 <defunct> processes (I am doing 20 greps in parallel) - which are cleaned up by the time I am able to look at my running processes again.
I just wanted to make sure that I am handling my children processes okay, and not leaving zombies for the OS to kill - I'm not exactly sure what a <defunct> process is, but I have a feeling it means that my children are not being killed properly.
Any suggestions? I've never experimented with forking, etc, so hopefully I won't embarrass my self with this post. I have used Parallel::ForkManager which is pretty nice, but it is not available on our machines.
Thanks in advance!