Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

calling XS sub in parallel

Status
Not open for further replies.

eve25

Programmer
Feb 9, 2004
32
US
Hi guys,

I have a program using an XS package (to use a C librarie with a driver), this package has 3 subroutines to:
- load the library
- run the C code on the specified file
- close the library

I am calling this package from another perl script and I need to load and close the library once and run the C code about 3000 times, which takes a while. So I want to run that in parallel (the cpu is used at about 25% for one instance running so I should be able to run 3 of those at the same time, right?).
I don't need any variable from the main script except the name of the file to run (P.1.txt to P.3000.txt) on and the results from the C code running is printed out in one and only one file (stats.txt)...

I have read that fork has issues with XS sub, I however tried the Parallel:ForkManager package but it is not working fine either...and moreover I don't need a copy of each variable from the main script for each of my process, so I don't thing fork is ideal anyways. (Maybe I should precise that the main script is using a huge chunck of memory (1.5GB) so I really don't want to duplicate that..)
Code:
.....
XSMat3::load();
$pm=new Parallel::ForkManager(3);
foreach $file  (@filesToProcess)
               {
               $pm->start and next;
               print "file being processed=$file\n";
               XSMat3::driver("/swan/output/spectra/$file");
               $pm->finish;
               }
$pm->wait_all_children;
XSMat3::terminate();
...

I tried to use threads, but I got a segmentation fault (certainly because of that same file the different processes are printing in and I did see you can share variable but not files and the printing process in inside the compiled C code)
Code:
$t = threads->create(\&XSMat3::driver,"/swan/output/spectra/$file");

I tried each one of these options with basic subroutine and it worked fine...
Would anyone have an idea of the possibilities (I tried a search in this forum and in google groups for parallel and XS language but didn't get any good result...)

Any help would be greatly appreciated...
Thanks a lot!

Eve
 
I can tell you that the seg fault would NOT be due to the same file being written to by multiple threads. Why would there be a problem with that?? Debug the rest of the code
 
well..I am pretty new to that parallel thing and I thought it was because of that cause if I run it sequentially I have no problem and if I run it in parallel I do and not for the same file (if I try many times)..and since I don't precise in the code how to access the file, I had guessed 2 threads were trying to write in it at the same time...
but you're right...checking the segmentation fault report ( the C code and library really comes from producing Matlab stand-alone application)and for some reason there is a fonction it doesn't like after processing some files, maybe it is when 2 threads try to access the library simultanously...I am gonna try to contact the Mathwork Support...
So you think thread is the good way to go (assuming I can solve that matlab error)?

Thanks,
Eve
 
Hi,

Just another note (sorry about that) to tell you that in fact the error is different each time (that's why I hadn't seen it first) so I am really thinking it is because 2 threads make the driver try to access the same 'function' in the library at the same time...and I send a note to the MatLab support but I guess it comes from the XS package because of the reason mentionned above...Has anyone been succesfull running XS subs in parallel?

Thanks a lot!
Eve
 
I think that in this case, there is almost no difference between forking and threading, and you can check that by timing a fork version of it and the thread version. This is because,amongst other things, by default, Perl threads share NO data. Variables must be defined as shared. Basically, that's what fork does too. The cost to create a thread is almost as expensive in my opinion.
That said, regarding the Segmentation fault, there is nothing wrong in itself with accessing the same function at the same time with 2 threads. The problem is if that library (or anything else) updates some counter. Then you have a classic race condition problem. Use threads::shared and lock a variable when you think you are accessing some library or peice of code that may not be thread-safe. Read more about it here

 
Thanks for your replies azzazzello!
Though I am confused....:-(
First of all, the fork thing doesn't seem to work with XS sub (I read about it and tested it, and that's true it doesn't).
Second of all this access to library is done in the C driver in the XS sub, so I don't think I can use any Perl features to share or lock anything and moreover that would be the library I have to share and you can only share data structures with thread:shared (I had checked that already, but thanks though...)
Also, as I said, my program is using a huge chunck of memory and I don't want my thread to have their own version of it, should I lock or share everything?
Besides, I am loading and unloading my library from the main program so that I can't use exec() or system() to run my driver.
Basically I just want my thread (or process) to share the C library and to know the name of the file they have to act on, that'all I want them to know....

Does anyone has an idea on how to do that?
Thanks a lot,

Eve
 
desregard my comments regarding testing your code wit hfork then :). All I wanted to say was that perl threading isn't as light-weight as it is in other languages.

Regarding the lock, here is what I mean.

Code:
   use threads;
   use threads::shared;

   use vars ($THREADLOCK);
   share($THREADLOCK);
    
   #definitions
   .........


   my $thread1 = thread->create(\&foo_sub,$param);
   my $thread2 = thread->create(\&foo_sub,$param2);
   ...

   sub foo_sub
   {
        my $val = shift;
        ........
        
        {
               lock($THREADLOCK);
               &some_non_thread_safe_sub_or_lib_call($val);
        }
   }


You lock a variable while you are running the sub. When the sub is done, end of block will unblock $THREADLOCK and another thread will be able to use the library again
 
hmm, I just read more closely your example. You are actually passing it the function of XS itself...if you were to block on that, there would be no point in threading in the first place. Well, in that case it seems that the XS function itself is NOT thread-safe. Are there people who can confirm that they got it working using threads?
 
hi again!

I keep looking for information on the web...and I am more and more confused.
I am not even sure it is possible to get my XS sub running in parallel since it is a mix from Perl and C (and Matlab in my case, since the C library comes from Matlab...).
Would anyone has manage to run their own created XS package in parallel? If so how?

Please help...
Thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top