Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Windows2003 Server Just Stops Responding

Status
Not open for further replies.

markpirvine

Programmer
Nov 7, 2003
26
0
0
GB
Hi,

We have a network consisting of three Windows 2003 servers (single domain), two servers are at our main site while the other is located remotely and connected via a broadband line. All servers run the following:

Server 1 (local): Active Directory, Global Catalog, Exchange, Trend AV, Veritas
Server 2 (local): SQL Server, Trend AV, Veritas, Blackberry Server
Server 3 (remote): Active Directory, Global Catalog, Exchange, SQL Server, Trend AV, Veritas

The network was originally based on Windows 2000, however over time all Windows 2000 servers have been removed - the domain still operates in Mixed Mode. All servers have the latest Microsoft updates and service packs installed. This problem has been ongoing for many months and first started when we introduced the first Windows2003 based server.

The problem is that every week one, or more of the servers simply stop responding. The server does not blue screen or log anything in the Event Log. The general pattern is that Remote Desktop stops responding followed by IIS, then mail destined for local delivery queues on each Exchange server - the message in the Exchange System Manager is that the remote computer dropped the connection. Eventually the server stops responding to any request. When we attempt to log in, the admin password is accepted (incorrect attempts are detected) however the desktop is never built. The only way to restart the server is to turn off the power supply.

Another problem is that the POP3 service, which is configured to start automatically, failes to start during boot. The services snap-in says that it is running, however the Exchange System Manger shows that it is stopped. Right clicking and starting it works fine, however no logs are generated.

We have completely run out of ideas, has anyone else got/had a similar problem?

Any help would be appreciated,

Mark
 
Firstly ensure that the servers are upto date with SP1 and all the other patches.

-------------------------------

If it doesn't leak oil it must be empty!!
 
Hi,

Many thanks for your reply.

All service packs and security patches have been installed, including SP2 for Exchange.

Mark
 
This sounds similar to a problem I have - have you found a solution yet? The only one I have come across is to do with APC and JRE certs expiring....
 
We're having a similar issue on a W2K3 SP1 TS (hardware HP Proliant ML350 G4p). All patches, service releases, etc. for all apps, etc. are up to date.

We thought it was hardware related (bad onboard NIC) as it started happening after we had the system board replaced shortly before the TS went into production in July and an external NIC appears to resolve the issue. Now that I see this thread I'm not so sure any more.

We'd been geting an occasional BSOD and/or reboot which the LEDs indicated was a system board failure. HP replaced the system board and all was well until about 10 days later. Now every once in a while (anywhere between 36 hrs and 20 days with no rhyme or reason) the server drops all the TS connections. All the LEDs (including the one indicating network activity) indicate all is well.

I can log on as a local admin at the console and shut the server down gracefully most of the time (once in a while it hangs on shut down and needs to be powered off). Only a hard boot corrects the error, a reboot doesn't solve the problem.

Also, and I don't know if this has anything to do with the issue at hand, we have an older (December 03) Compaq (HP)Proliant ML350 G3 running W2K3 Standard WITHOUT SP1 that hasn't given us any trouble since we brought it on line. It is NOT a TS and acts as our DC, WINS and DNS server, and a file/print server. We have to update this server to SP1 in a couple of days to satisfy some techs I'm dealing with re a software issue so it will be interesting to see if the dropped connection happens on it after after upgrading to SP1.

We've got an open incident with HP on the ML350 G4 and have just run diagnostics for them so if I get any resolution from them I'll post it here. My issue may very well be a hardware issue but, as I stated above, after seeing this thread I'm not so sure.

Cheers.
 
i have a simular problem...having new memorey sent...do a memorey test????

The most overlooked advantage to owning a computer is that if they foul up there's no law against wacking them around a little.
 
Hi,

Many thanks to everyone for your posts, unfortunately there is no solution yet :( It is comforting to know that others are having similar issues.

ArcUser: We only have APC software installed on Server 2. I had heard about the issue, but don't see how it could affect the other server's. The software is scheduled to be removed this weekend (19 Nov 2005).

cmeagan656: We first thought it was hardware issue, but when the second server started doing the same we became unsure. Interesting that your stable W2K3 server does not have SP1 installed. Our first W2K3 server came with SP1 already applied. It would be interesting see if your Compaq G3 starts misbehaving after SP1.

schtek: We have carried out memory test's on all three servers and all appears well.
 
cmeagan656,

Just wondering what AV software your using?

Mark
 
Have you tried disconnecting the networking cables when the problem is occuring.

Could be an LSASS resource issue. That would make some sence if it is happening to several servers at random. I've seen things like this in the past. If it is LSASS, then disconnecting the network cable will restore the system without rebooting. Just disconnect and leave it alone for a minute or two. If it doesn't come back, then LSASS isn't the issue.......

When you say the computer isn't responsive, do you have console access or is that locked too? You might want to get a full physical memory dump of the system and check resources. Load the dump in a debugger and run !vm. See if you have a paged or non-paged pool issue.

Other things you can do...

Run a perfmon log against a group of the servers (if not all of them) Check all counters and instances for Object, Memory, Processor, Process, Thread, Server, and Redirector. You might find something there. Do the perfmon remotely so the blg file won't get corrupted when you hard boot....

 
SgtBeavis,

Thanks for your reply. Will try your suggestion regarding the network cable the next time one of the local servers go down.

Sometime we have access to the console and sometime we just get the background image.

Your post has been very informative, thanks

Mark
 
marcpirvine:

We're using CA eTrust InoculateIT 7.1 build 502 with the latest drivers. Given CA's track record that was one of the 1st things I checked. I can't match any InocIT event (sig updates, polling from the InocIT admin server, purging of expired logs, etc.) to when the server crashes.

Cheers.
 
SgtBeavis:

I'll try disconnecting the network cable next time the server crashes.

I'll also try your othe suggestions as well.

Thanks!

Cheers.
 
Sorry SgtBeavis, none of your suggestions worked.

However I did find this thread thread931-1165725 referencing a M$ KB article ( which I'm going to try. In the meantime I'm using an external NIC instead of the on board one and that seems much more stable.

Cheers.
 
cmeagan656,

Many thanks for the info, the hotfix does sound promising - will download and try it. Maybe this will solve our Exchange connectivity problem. I tried disconnecting the network cable, but this did not seems to have any effect.

Since moving from Trend to Sophos, both servers 1 & 2 seems a bit more stable, however both have been reboot a few times this week because of new application installs. Server 3 is still experiencing the problems.

Will apply the hotfix and post my results.

Mark
 
Hi,

Just a quick update before I leave the office for the holidays. I downloaded and applied the patch ( and so far have not had any server become unreachable or had any Exchange connectivity problems. I don't want to say that the issues is resloved, but if I get through the holidays without incident then I will be very happy.

cmeagan656, did you have any success?

Mark
 
Sadly, no. I guess our issue really was a hardware issue. I put in an external NIC and haven't had any problems since.

We have a 24/7 4 hr response service contract with HP but after 3 system board replacments there comes a point where you just can't afford any more down time, even after hours, and have to cut your losses. We're an accounting office in Canada so from now until the end of April we're super busy and have users working remotely from home after our normal office hours.

Maybe, after the end of April, I'll revisit the issue. For now I'll just keep my eyes open in case HP quietly comes out with a firmware update that resolves the issue.

Cheers.
 
cmeagan656,

Thanks for replying, I hope you get to the bottom of the problem. I will update the thread in the new year.

Happy Holidays,

Mark
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top