Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

problems with gcc on multi-proc sparc?

Status
Not open for further replies.

eidsness

Programmer
Aug 29, 2001
3
0
0
CA
I'm using gcc 2.95.2 on multi-processor solaris box and am encountering a transient segmentation fault.

I'm creating an element in a thread and adding a pointer to it to an std::queue (and signalling a semaphore). This thread has no more access to the element (it drops the pointer, and never takes things out of the queue).

A second thread waits on the sema and then takes an element out of the queue. After processing A, it is deleted. Now, this element contains an std::list which is also deleted which in turn deletes all of its contents. Sometimes (maybe once in 10,000) there is a segmentation fault while deleting a list entry. This only happens on our multi-processor box, it runs fine on all the single-processor machines I've tried, and sometimes runs fine on the multi-processor machine.

Since its a seg fault I guessed that perhaps I was deleting an element twice (although I would expect that to cause an error everytime) and confirmed that all copy constructors perform fully deep copies, etc. No effect.

Now, if I take out the thread, so that I add to the std::queue and then immediately remove the element to perform the processing, the problem seems to go away (I tried 4 times as many iterations with no error).

Does anyone know if the std::list implementation has problems threading on multi-processor machines? If for some reason the std::list dtor were running past the end of its array it might explain the seg fault...

Any ideas?
 
I cannot figure out your exact situation because
I think your problem is not simple. But please
consider this scenario.

You said an element added and signaled by a thread
( I'll call it thread 1 ). And when it is signaled,
another thread (I call it thread 2 ) uses the queue.

At this time, if thread 1 adds an element the queue,
the two threads uses the queue at the same time. This
can cause some problem like segmentation fault. Please
consider this scenario can be occurred in your
implementation.

If it is possible, it is preferable to use mutex so
that a thread can exclusively access the queue.
Hee S. Chung
heesc@netian.com
 
That's a good idea -- however access to the queue is properly wrapped with a critical section. Originally I thought that the problem might be an incorrect implementation of this utility class, but after alot of testing, it seems pretty solid.

Anyhow, I was able to reproduce a variant of the problem in a much simpler application. This simpler application launches two threads, each of which increment their own counter var and use cout to display it. This application still produced segmentation faults deep in the std lib code.

Now I know that cout has no synchronization mechanisms, so the output may trample each other, but I wouldn't expect crashes due to this.

So I started investigating our tool chain. It turns out that our libraries were built for solaris 7, and when I run on a single processor sol 7 box everything is fine (the counters got to over 10,000,000). However when I run on a solaris 8 (single pro) box, they only got to around 1,000,000 and then crashed. Finally, on our sol 8, multi-pro box they crash at around 1,000. I guess the (what should have been obvious...) lesson here is that you should make sure your tools match your environment.

I've just started looking through the stdlibc++ docs, but if anyone has pointers on building the library for a multi-pro, thread aware env please pass them on.
 
Maybe you think the standard C++ library has some problem
in multi-threading environment and it is evidently possible.

Then what do you think about this test?
The idea to lock mutex for every doubtful place,
for example, a pseudo code :

mutex_lock
cout << &quot;my message&quot;
mutex_unlock

If there is no problem when this strict synchronization
is implemented, then now is the time to delete the mutex
lock one by one and confirm the problem is arised again.
It's my foolish notion. :)

Good luck!

Hee S. Chung
heesc@netian.com
 
Thanks, that's certainly an idea...I would expect it to give the same results as the single-pro sol 5.8 box.

I really think that the problem comes from using an incorrect version of the library...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top