Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SERVER keeps hanging at least once a day 3

Status
Not open for further replies.

StressedTechie

Technical User
Jul 13, 2001
367
GB
I have a problem with our main file server which has me, Microsoft and my local IT firm confused.

At least once a day the main server hangs. The first thing I know about it is usually Outlook throws a message up saying it cant contact the Exchange server. When I check the server out there is usually a blank blue screen displayed with the mouse pointer. I can move the mouse pointer around but that’s all I can do the only way out this situation is to throw the power switch and pick up the pieces when it comes back up.

Twice I have been to the server console after a hang and been presented with the ctrl alt del screen. I can then type in the username and password but when I click OK it goes no further. Again I have to cycle the power.

This started happening back in October but the actual hang only appeared to happen every three or four weeks. It has now suddenly jumped to every day. There is no pattern to hangs. Occasionally it will happen at 9am other times at 11 or even during the lunch hour. It hasn’t happened in the afternoon yet.

So what have done. Well first things first we ripped out the memory and replaced with brand new modules. Next we phoned Microsoft with a Business Critical call they connected via VNC and checked the server out. Nothing looked irregular and they were happy with the general setup of the machine and the software installed. They installed a logging app that could be invoked after a crash that would create a log that they could scan through.

When they looked through the log they again found nothing apparently incorrect. They reckoned it could be a virus protection software problem. I reckon this is a cop out we removed the virus software (Sophos Enterprise edition) but still the server is hanging.

I do occasionally connect to the server via RDP this has never caused problems but on Friday last week as soon as I came out of the session the server hung. I searched Microsoft and found this its describes some of the symptoms but this has only occurred once as a direct result of connecting via RDP. I am not convinced this is the problem.

So basically I am a dead end. The IT Firm wants to pull the server out and put a brand new one after ghosting the live server. This in theory should work if its hardware related but if the problem is software then its just going transfer itself to the new server. One other thingto note there is no relevant events being logged in the Event Viewer. Perf Mon also displays normal results and no excessive hikes in memory usage etc.

The server spec is as follows.

Operating System Windows Server 2003 5.2 - Small Business Server Domain Controller

Build Build 3790

Update or Release Service Pack 1

SMBIOS level 2:3 D19

Manufacturer HP

Model ProLiant ML350 G4p

BIOS date Unknown

Asset ID GB861223CX/GB861223CX

Hyperthreading is supported

Processor #1 Intel(R) Xeon(TM) CPU 3.00GHz

Processor details 15:4:3

Processor #2 Intel(R) Xeon(TM) CPU 3.00GHz (HT)

Processor details 15:4:3

Actual memory 2000 Mb

Memory chips 512Mb+512Mb+1024Mb+1024Mb

Memory slots 4 of 6 used

Maximum memory 12 Mb

Video adapter RAGE XL PCI Family (Microsoft Corporation)

Monitor Plug and Play Monitor

Screen resolution 1024 x 768

Screen color depth True Colour (32 bit)

Network adapter HP NC7761 Gigabit Server Adapter - Network Load Balancing Filter Device

If you shed any light on any possible causes I think you can appreciate I will be extremely grateful.
 
How about runnig a NETDIAG and DCDIAG and seeing if they report any problems.

TYpical way to troubleshoot this kind of issue is to run MSCONFIG and disable all non-Microsoft services from running and reboot.

Run it that way for a day and see if you still lock up. If not, run MSCONFIG again and add 2 services back in. Then wait to see what happens. Continue this process until you get a lockup again. Then you know it is a problem with 1 of the 2 services.

I hope you find this post helpful.

Regards,

Mark

Check out my scripting solutions at
 
I'd go with Mark's suggestion, the only thing I could suggest is disabling hyperthreading in the bios (if it's enabled). I've seen hyperthreading cause 'weird' problems before. I doubt if it's that but it may be worth a go.


Paul

MCSE 2003

"Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe."
Albert Einstein
 
Hello StressedTechie. Your post jumped out at me when I saw the words "Outlook" and "cannot connect to the Exchange Server". I just recently fixed a problem where frequently we would get the error messages from Outlook that it could not connect to the Exchange server and at the same time we could not access the \\ServerName\SYSVOL. We would get usrenv 1058 and 1030 all over the place. I want pull what hair I have left out. I did every update. Still no joy. Then I read an article that SBS in some cases has problems with Teaming NIC software. I uninstalled the manufacture NIC software and just installed the base drivers and disabled options in the bios for the NIC. This was for a Broadcom NetExtreem internal NIC. I see you are using "Network Load Balancing Filter Device". This may be your problem. I hope this helps.
 
Thanks Guys all your advice has been taken onboard and I will gradually work my way through them.

Without cursing things I found an article about APC UPS Software occasionally causing system hangs and all sorts of odities. I have uninstalled the software for the time being and (Touch Wood) the server didnt crash at all yesterday. There was at least one crash per day last week and Monday this week resulted in two!! :(

We are also using Symantec Backup Exec 10d and the Continuous Protection Server Module. We had this configured to take a snapshot every 3 hours Monday to Friday. We are also thinking this may be a possible cause. This service has been put on hold for the time being as well. EIther the APC or CPS removal and suspension appears to be working for the moment!!

I will keep you guys informed and document all findings here for future reference. Thanks for your advise so far if the crashes resurrect themselves I will try them out. I am hoping I am on to something now!!
Thanks again
Not so Stressed Techie at the moment anyway!!! ;)

I will post an update later today!

 
Sorry I should have kept you guys up to date.

Well i am still running the server without CPS services started and with the APC software removed and the server has not crashed at all. I narrowed my sights to Symantec CPS but want to run the server in its current state for at least one more week just to eliminate any one offs. But it does seem to have settled down, my users are happy so so am I. I have just installed an Ultrium drive as well which has reduced the backup time by nearly 8 hours!! So all in all things are looking good.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top