Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Force Completion of function without thread swaping?

Status
Not open for further replies.

Insider1984

Technical User
Feb 15, 2002
132
0
0
US
We have a manufacturing machine with multiple threads that perform very repetitive work. We would like to keep a thread doing it's work until a function is done. In java I guess it's called "critical sections" (could be wrong) where you can force the function to finish before the processor would be allowed to work on something else.

Thanks for your help.

=====================
Insider
4 year 'on the fly' programmer
C, C++, C#, MFC, Basic, Java ASP.NET
 
I think there might be some misunderstanding.

here we are not concerned with thread safety, what we are worried about is a function that (because windows is oh so smart) tries to leave before the work is done and then comes back to finish later.

Our issue is that we have a separation of interests and have many little tools doing repetitive work on separate threads. They are not trying to access the same function but they are trying to run through to completion and give up the CPU at that time.



=====================
Insider
4 year 'on the fly' programmer
C, C++, C#, MFC, Basic, Java ASP.NET
 
Isn't that the way threads were suppossed to work (in a round robin or other method kind of way). Since in the early days you only had one processor you had to simulate multithreading. With multiproc systems you will have less of this problem but you can't be sure. I don't think it would be smart to give one thread the upperhand (setting a higher priority) because it could freeze the computer and that is just what threading is there to avoid.

Christiaan Baes
Belgium

"My new site" - Me
 
Setting a higher priority will take you nowhere. Unless you get a customized kernel for your OS, I see no way for you to force a thread not to be re-scheduled in favor of another thread.

About the "(because windows is oh so smart)" part, I suggest you try it on another multi-tasking operating system and then come back to us and show how you did it. We are eager to hear about that [bigears].

Our issue is that we have a separation of interests and have many little tools doing repetitive work on separate threads. They are not trying to access the same function but they are trying to run through to completion and give up the CPU at that time.
If what you want would be possible, and you have let's say 3 threads, it would work like this:
* Thread1 runs to completion and since there is no multi-tasking (that's what you want, basicly) Thread2 and Thread3 are waiting for processor time since Thread1 cannot be re-scheduled;
* Thread1 has finished running;
* Thread2 runs to completion while Thread3 waits for the above specified reasons;
* Thread2 has finished running;
* Thread3 runs freely


It looks like single threading programming to me... So please answer these:
1. Have you tried running all those little tools in a sequence?
2. You realize, of course, that *NOT* letting the kernel scheduler to re-schedule your thread and let yours wait for a while, the system will appear to freeze until the completion of your thread? (No async calls and by that I mean I/O, no GUI response, nothing...)
3. Do you realize that even in a single threaded operating system, your thread gets somehow "re-scheduled" when a low level hardware interrupt was issued by let's say a device/controller? The OS (i.e. MS-DOS) would save the state of the currently executing "thread" and execute the specified interrupt...
4. Have you considered re-designing your code? Something is pretty fishy if you require complete, total, unrestricted and exclusive access to the processor...

Maybe more information would help us help you. Why don't you elaborate a little bit?
 
Thanks for the responses. There is something that I left out and just is bad is the responders are assuming that we are running on a single CPU with a single core.

We currently have 16GB of memory are running Windows x64 and have 4xDual Core Opterons running the system.

The application isn't an end user friendly event based controls or some simple form or DB flashy show but hard core data processing algorithms utilizing the ACML (low level functions produced by AMD for math intensive processor direct computation).

The problem that we have is that currently we are under utilizing the cpu and saturating the memory as the data is very large and in comparison small amount of work is being done on it.

A comparison would be the exact opposite of any of the distributive computing fun applications like Seti@Home etc.

After profiling the software it seems the processor is frequently leaving functions and returning later giving us little hope of caching commonly used memory and commands.

I'll respond to the 1-4 list above as well:

1. Yes (Started out that way actually), unfortunately that isn't feasible because we need multiple cores to do the work in 1 hour that most applications see in 1 week.

2. That would be making a very large assumption that:
-we are sitting on a single core system
-that the little units of code take more than 1 ms to process

3. yes I do and I would allow windows to have an entire core (1/8 of the CPU's) (and our simple gui) to continue to do all this stuff if possible

4. While of course I'll be a bit defensive in this statement, I have no doubt that the current design is needed to meet the requirements of a flexible component design while attempting to optimize processor intensive code by working directly with AMD. The size of the pure incoming data is 64512Mb/Minute at a constant stream which we have already split across 6 systems (10752MB per system). This is not a typical application and the team working on this is extremely knowledgeable. Working with Microsoft it seems that have only a half dozen people that truly have a good grasp of multi core programming so I think we are doing pretty good.

Perhaps it is our knowledge of processor optimization that brings us to where we are today... seemingly memory limited. Unfortunately while we are working hard currently to reduce the amount of memory allocation throughout the program in seems because of the large amount of "chatter" between the threads (using an universal interface) is creating issues.


I should have also mentioned that the Windows OS thread management is smart when it comes to multiple applications and trying to give everyone equal access to the CPU. Basically a one size fits all tactic.

Unfortunately our application is specialized enough to require a little more "smart management".


As for ideas.... I know there are many things you can do with a thread but maybe some questions to answer:

1. Would it benefit us to force threads that chatter a lot with each other... force them on the same thread?

2. Even if it seems like a bad idea to everyone... is there a way to allow a function (remember very tiny) to complete before moving on?

Well thanks for the help and responses. I do not want to sound ungrateful by any means and I realize I didn't give enough information to explain why I want to do this (some things are better left unsaid). I also do not want to sound arrogant about the "quality" of the code because obviously if I knew everything I wouldn't be here.

=====================
Insider
4 year 'on the fly' programmer
C, C++, C#, MFC, Basic, Java ASP.NET
 
1. Thread that chatter one to another? Seems like a granularity problem to me. Maybe there is too much lock overhead and too much contention. First means that there are too many locks and the second means that a thread is waiting too much for a lock to be released by another thread. In the first case, you should definitely see a high kernel time for your application. Concerning the part where you're asking how can you determine a thread to run on the specified core, I must remind you that this operation is called "setting a thread's affinity". Note the affinity term. It means that when you set a thread's affinity, the operating system's thread scheduling algorithm is altered (read "curbed") as to favor the thread to use the specified core. Microsoft states that the operating system is not required to honor the affinity/priority of a thread. The result? Under certain circumstances, your thread might not run on the desired core.
Or maybe a higher level of parallelism would help. If the data processing is not sequential and if you can split the data into more chunks, it would be probably better to use (for example) 4 threads to process it instead of 2 (again, watch the contention and lock overhead).

2. The tiniest "function" that I know of that can be execute without being re-scheduled is an interrupt. There could be a bunch of them in the kernel & drivers also, but I am not sure. The answer to your question is (as far as I know) negative.

A simple question: in your application, do you have threads with different priorities and which also (at some point of their execution) require exclusive access to a common resource?

P.S. Sounds like a challenging problem, to bad I'm not there [pipe]. As you are nowhere near getting a highly customized version of Windows kernel, which would probably fit your needs, you need to fine-tune the code until you get a reasonable compromise.
P.S.2: Anything thay you can share can spark new ideeas :)
 
How many threads do you have doing work? It sounds like you have too many. 4x dual-core == possible 8 threads running simultaneously.

The normal advice for increasing performance would be to have 2x the number of threads as processors (cores) to allow for I/O blocking on some of the threads, and use async IO methods. But in your case I would suggest dropping the number of worker threads to 7 (allow 1 thread for running the OS and other tasks). By doing this you'll reduce the chance of the OS swapping your thread out. Sounds counter-intuitive, I know. But give it a try.

Of course, as soon as you issue a blocking command, your timeslice will end, and the OS will queue up someone else's thread. If you use up all of your time quanta, the OS will attempt to swap another runnable thread in...but if there aren't any other eligible runnable threads, yours stays in memory, and will (presumably -- I haven't tested this myself) retain it's locality cache.

Regarding having a way to force the CPU to allow a function to complete -- this is really the antithesis of a multi-tasking OS. It's specifically designed to not allow low-priority threads to starve. You can do tricks like boosting the priority of your worker threads to minimize the number of context switches, but they'll still occur.

You know, thinking about this, what you really want is a version of DOS that can address 16gb. No multi-tasking -- just a single processor that does nothing but run your user-mode code. Just pure computation. :)

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
The other thing that just occured to me is that you're running on a quad Opteron system. The AMD hyper-transport architecture is a NUMA-based system (Non-Uniform Memory Access -- where each CPU has it's own main memory). So if your thread gets re-scheduled to run on a different CPU, to get it's local cache back, it has to run an expensive remote memory access across the transport bus to the memory on it's original CPU.

So B00gyeMan's idea of setting the thread affinity is a good one.

Chip H.


____________________________________________________________________
If you want to get the best response to a question, please read FAQ222-2244 first
 
Hi chiph. You just gave me an incredible idea for all this!!!!!

It would be possible for us to query the number of processors and spawn the lowest level classes on the same core which would actually help us in many ways.

once i have the results i'll repost back to see how this works.


=====================
Insider
4 year 'on the fly' programmer
C, C++, C#, MFC, Basic, Java ASP.NET
 
If you REALLY need real time behaviour you MUST use a real time operating system. There are companies that have created modified NT/2k kernels that offer realtime facilities.

Unmodified Microsoft Windows cannot guarantee timely execution of any instruction or procedure. Don't be fooled by the 'realtime' process priority setting. Even a low priority process can hog the CPU and even on multiprocessor machines disk IO can slow down practically everything.

This is especially important if you are controlling fast physical machinery that needs attention at specific times.

I googled for:
real time windows
and came up with this interesting hit:

You might be able to refactor your code so that real time execution is not required. Or you could create a device driver that responded to interrupts; however writing device drivers is difficult and debugging them is worse.

Lastly C#/.NET might not be the best choice for real time programming (what happens when the garbage collector starts scavenging?).

Good luck!
 
kwhitefoot, your post is correct, for speed alone, Windows and .Net are certainly not the best methods of choice. Unfortunately we find ourselves in a sort of twisted dance between quick to market, as fast as possible, and as object oriented as possible. We felt .Net 2.0 with generics, queues and more offered the best option for everything but speed and AMD has been helping us with optimization based on the speed part of things.

=====================
Insider
4 year 'on the fly' programmer
C, C++, C#, MFC, Basic, Java ASP.NET
 
I'm not shure that this would be any help but in net 2.0 you should be able to change thread affinity, there for assigning each thread to an CPU. Well at least i use that on Task Manager sometimes for different applications.

________
George, M
Searches(faq333-4906),Carts(faq333-4911)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top