Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Scheduled work to do

Status
Not open for further replies.

pagy

Technical User
Sep 23, 2002
1,162
GB
For the last 3 days at about 10:00 in the morning one server has started giving this error;

A scheduled work to do took more then 1 minute to complete

The alarm type on the Zen console says;
Work to do took too much CPU.
Alarm summary -
A kernel work to do on server Trinity took more than 1 minute to run

All PCs attached to this server then stop responding and the server has to be restarted before then can connect again. I can use the server console though to load and unload modules without a problem.

Now I've had this before and its always been a hardware problem, disk on its way out, dying raid array etc etc but the H/W seems fine. Its a Dell poweredge 2600 and I've checked the array with the Openmanage array console and thats not showing anything wrong with the array or any of the disks. The alert light on the front of the server hasn't gone red either as it usually does when some H/W is goosed.
Theres nothing special that runs at 10:00 in the morning so I'm a little confused. Any suggestions gratefully received.

Thanks




All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain
 
Actually looks like it may be the DHCP service that has the problem, the abend log showed a couple of abends about 2 weeks ago where dhcpsrvr.nlm has caused a cpu hog abend.

I've unloaded it for now to see if the server has any problems when dhcpsrvr is unloaded. Most of the PCs have static addresses anyway so thats no problem.


All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain
 
Ok it wasn't the DHCP service. We've noticed the Service Processes becomes maxed, IE its set at the default of 500 and yesterday morning the number of service processes reached 500. At this point the server becomes unresponsive to any more requests. I upped this to 1000 and all seemed OK until this afternoon when the server then used all 1000 of these as well.
If I then look under Kernel - Busiest Threads MakeWorker Thread is at the top but I don't know if this is cause or effect.

Also looking under Kernel - Processors - Time spent in interrupts is always above 22milliseconds.

Any ideas anyone?




All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain
 
And under Kernel - Interrupts - OS Allocated Bus Interrupt is the Interrupt running the execution time at 22 milliseconds


All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain
 
Sorry, its 6.5 SP1. Plans to roll out SP5 are in the making at the moment.

The server has been running for the best part of 18 months and this problem has only started happening in the last 4 weeks and seems to steadily getting worse, IE it would fall over once a week initially and now seems to be at least once a day.

I saw TID 10021510 that suggested checking HW, although that TID was for Netware 4.


All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain
 

Looks like it was hardware related. We swapped the box out and everything appears ok now.


All you need in this life is ignorance and confidence; then success is sure.
- Mark Twain
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top