Scheduled work to do

pagy · Jul 4, 2006

For the last 3 days at about 10:00 in the morning one server has started giving this error;

A scheduled work to do took more then 1 minute to complete

The alarm type on the Zen console says;
Work to do took too much CPU.
Alarm summary -
A kernel work to do on server Trinity took more than 1 minute to run

All PCs attached to this server then stop responding and the server has to be restarted before then can connect again. I can use the server console though to load and unload modules without a problem.

Now I've had this before and its always been a hardware problem, disk on its way out, dying raid array etc etc but the H/W seems fine. Its a Dell poweredge 2600 and I've checked the array with the Openmanage array console and thats not showing anything wrong with the array or any of the disks. The alert light on the front of the server hasn't gone red either as it usually does when some H/W is goosed.
Theres nothing special that runs at 10:00 in the morning so I'm a little confused. Any suggestions gratefully received.

Thanks

All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain

pagy · Jul 4, 2006

Actually looks like it may be the DHCP service that has the problem, the abend log showed a couple of abends about 2 weeks ago where dhcpsrvr.nlm has caused a cpu hog abend.

I've unloaded it for now to see if the server has any problems when dhcpsrvr is unloaded. Most of the PCs have static addresses anyway so thats no problem.

All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain

pagy · Jul 26, 2006

Ok it wasn't the DHCP service. We've noticed the Service Processes becomes maxed, IE its set at the default of 500 and yesterday morning the number of service processes reached 500. At this point the server becomes unresponsive to any more requests. I upped this to 1000 and all seemed OK until this afternoon when the server then used all 1000 of these as well.
If I then look under Kernel - Busiest Threads MakeWorker Thread is at the top but I don't know if this is cause or effect.

Also looking under Kernel - Processors - Time spent in interrupts is always above 22milliseconds.

Any ideas anyone?

All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain

pagy · Jul 26, 2006

And under Kernel - Interrupts - OS Allocated Bus Interrupt is the Interrupt running the execution time at 22 milliseconds

All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain

marvhuffaker · Jul 26, 2006

What OS and service pack are you on? Seems like I saw a bug report on this once..

Marvin Huffaker, MCNE

http://www.redjuju.com

pagy · Jul 27, 2006

Sorry, its 6.5 SP1. Plans to roll out SP5 are in the making at the moment.

The server has been running for the best part of 18 months and this problem has only started happening in the last 4 weeks and seems to steadily getting worse, IE it would fall over once a week initially and now seems to be at least once a day.

I saw TID 10021510 that suggested checking HW, although that TID was for Netware 4.

All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain

pagy · Jul 30, 2006

Looks like it was hardware related. We swapped the box out and everything appears ok now.

All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Scheduled work to do

pagy

Technical User

pagy

Technical User

pagy

Technical User

pagy

Technical User

marvhuffaker

MIS

pagy

Technical User

pagy

Technical User

Similar threads

Part and Inventory Search

Sponsor