Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Server hangs every now and then at 19% CPU Utilization

Status
Not open for further replies.

pilonbrad

Technical User
Nov 19, 2005
24
CA
Hi,

At the High School where I work at we have Netware 6 SP5 and Zenworks 4.01 ir6 running on the student server. We also have McAfee Netshield 4.62 with the updated engine 4.4 running. We also have 1 GB of DDR2 Dual Channel PC 3200 RAM. The server is a Dell Poweredge 2800.

Every Friday at 8:00pm I have McAfee setup to do a Full Scan of all the volumes. For thes rest of the time I have on-access scan setup to scan all volumes for when new files are written and read from the server.

Two Thursday's ago I noticed a problem. During period 4 the students were logged in and saving files were extremely slow. I went to the server and I was albe to do ALT+ESC to browse the server screens. I noticed that the CPU Utilization at the McAfee screen was constant at 74%, and at the monitor screen the CPU Utilization was stuck at 19%. I was able to unload McAfee from the McAfee scree by pressing F10. I then checked the monitor screen and the CPU Utilization was still hung at 19%. I went to the server console and there were no abends <1>. I tried running dsrepair and it was like the command hung. I then tried ALT+ESC+CTRL to down the server and I couldn't. I had to manually down the server by pressing the power button. When I re-booted the server everything worked fine again. Two days go by and it does the same thing over the weekend. The server got re-booted Monday and it worked fine and then on Wednesday in the morning the server caused the same problem. Friday at 5:00pm after school, the server caused the same problem again. It seems that about every 2 days the server hangs at the server console (running dsrepair, nwconfig or any command hangs) and the CPU Utilization stays stuck on around 19%. There's no specific time when this happens.

I have file caching turned off on the server as well as on the client workstations.

I then changed McAfee so that it doesn't on-access scan the user volumes because especially with the yearbook going on there's lots of large files being written to the server.

The only change I made recently was setting up all the NDPS printer agents to retain jobs for 3 days. The NDPS volume in not on volume SYS but on our volume APPS and there's lots of storage space. I verified and there's only 42.5mb of retained jobs int he NDPS database.

I can't think of why this problem is happening, as the server has worked good for almost a year, and suddenly in the last 2 weeks the server as mentioned above seems to hang and I can't eneter in anything at the server console as the command I type in will hang as well.

Now when this hang occurs students can still log in and I can still ping the server but it's really slow until I re-boot the server. I checked the switches for improper activity and everything seemed fine at the switch end, the lights blinked as they should. Someone mentioned to try an re-install support pack 5 for Netware 6.

Has anyone every encountered a similar problem as above where the CPU Utilization stays hung at a particular percentage such as 19% and that no commands can be entered at the server console (suchs as dsrepair or nwconfig) without it hanging, and what can I do to fix it?

Thanks!

Brad
 
Hi,

well, could be a number of things going on here. Do you have Tradional volumes or NSS ? Whichever it is, I would be looking at the IO from your hard drives incase one is failing slowly. It may also be that rebooting without a proper shutdown has caused some corruption. There is also a POST SP5 fix for NSS volumes that should be applied if you are using NSS. If the error appears at the same time/state regulary then check the server processes if you can in Monitor to see what is happening. Mcafee can put a strain on the hard drives that causes Novell to freeze. With the amount of RAM you have this should be able to cope so Im thinking hard drives here. I may be wrong but start with a server Healthcheck, there are settings you can tweak in monitor that may improve your disk IO if there are no physical problemm with the drives. Let us know and I will try to help further.
 
Hi,

Thanks for the information!

I downloaded the NSS Modules Post Netware 6 SP5 and am going to apply it to the server after school through NWConfig. We are using NSS volumes for our volume SYS as well as a few other volumes, some volume still are traditional. Because we have NSS volumes as mentioned above I'm going to apply the patch you have recommended.

You said "Whichever it is, I would be looking at the IO from your hard drives incase one is failing slowly."

How do I check the IO from my hard drive to see if it is slowly failing? Which settings should I tweak in Monitor to help improve the IO?

Unfortuntely, the problem was occuring at different times. You also mention "If the error appears at the same time/state regulary then check the server processes if you can in Monitor to see what is happening."

Do you mean when I'm at the monitor screen, check the general information to see the server processes or is it at another location?

Thanks for all your help!!!
 
you want to be checking in monitor to see what is utilising the resources
you need to drill down further than monitor

or alternatively you can use the browser - possiblely distort the figures a wee bit but you will get the yist


a dsrepair will not help you here - this should be run when all ok to verify issues - i'm certainly not a fan of the unattended repair here
 
Hi, sorry for the delay in getting back to you.

With your NSS volumes there are similar utils for checking and verifying them as with traditional volumes. The good thing about NSS is you can schedule disk checks at convenient times. At a quiet time do a check on the volumes, you will find lots of info about NSS and the utils on the Novell website. Use this guide - or start by typing NSS on the console.

On server optimisation see the recent post on this forum titled "unable to save files" as there was a full and helpful discussion going on there.

the best way to monitor your hard drives IO is to use something like Adrem Server manager to do a baseline monitoring of your novell server. You can download it from Adrem on a 30 day trial. it is extrememly useful for diagnosing problems and gives a much better all in one admin tool than anything novell supply. goto and follow the links for Adrem Server Manager 5, if you dont want to fill in the reg info use "download" as the password.
Hope this helps you.
 
Datong,

Thanks so much for all your help! I appreciate it!

Brad
 
Let us know if any of this solves your problems as its sometimes difficult to give an exact answer to a problem. There are some really helpful people on here so dont be afraid to ask, no matter how bad it sounds.
 
Hey Datong,

The server seems to be running a lot better. I ran a rebuild on the SYS volume which is NSS and so far, knock on wood, it's been oer 2 weeks since the server lagged.

It's good to hear that there's helpful people in this forum.

Thanks again!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top