Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Exchange 2000 Server randomly crashing

Status
Not open for further replies.

Haleon

IS-IT--Management
Feb 2, 2004
80
US
Hello everyone, I'm posting here because I'm just up at a wall at this point. I've been managing an Exchange 2000 Server SP3 running on Windows 2000 Server SP4 for the last year and a half. For the most part, it's been running fine that entire time. On Friday I had a strange problem. One user complained that she was unable to log into her email inbox, so I took a look at the Exchange Server, noticed a few errors in the event viewer that I had never seen before, and rebooted the machine to see if it was a fluke. When I rebooted, the machine came back up, Exchange started fine, and everything seemed good. I didn't think much of it. About 4 hours later, the exact same thing happened, and I got the exact same error messages. It's been happening at random intervals now since Friday, and I can't seem to peg down exactly what's happening.

When the computer stops responding, I get the following error messages in event viewer in order. These are all from the application log:

Process MAD.EXE (PID=1340). All Domain Controller Servers in use are not responding:
dc.domain.com

-----

DSACCESS returned an error '0x80004005' on DS notification. Microsoft Exchange System Attendant will re-set DS notification later.

-----

Process MAD.EXE (PID=1340). All Domain Controller Servers in use are not responding:
dc.domain.com

-----

Microsoft Exchange System Attendant reported an error '0x80004005' when setting DS notification.

-----

Background thread FDoUpdateCatalog halted on database "First Storage Group\Mailbox Store (ALACEXCH)" due to error code 0x80004005.

-----

Background thread FDoUpdateCatalog halted on database "First Storage Group\Public Folder Store (ALACEXCH)" due to error code 0x80004005.

-----

NSPI Proxy can contact Global Catalog dc.domain.com but it does not support the NSPI service. After a Domain Controller is promoted to a Global Catalog, the Global Catalog must be rebooted to support MAPI Clients. Reboot dc.domain.com as soon as possible.

-----

And then the exchange server becomes unresponsive. A reboot fixes it for a couple of hours, but then it goes back to where it's at.

It "seems" to be pointing at a DNS issue with my domain controller, but I see no errors on the DC at all. And no changes have been made on it. As far as the Exchange Server, I installed Symantec AV/F about 3 weeks ago, but haven't had any problems with it. I disabled it just to make sure it wasn't causing the problem, and I'm still getting these error messages.

So like I said, it just seemed to pop out of the blue. Anybody got any insight as to what could be causing it? Thanks!

Jon
 
As an interesting addendum to this post, I found out that the Active Directory Connector was not being started on the Exchange Server. It was set to use the network administrator account under the format of

DOMAIN\Administrator
Password

Instead of the System account. That just seems odd to me because every other service runs under the system account.

When I tried to start the service manually with the above logon credentials, I received a 1069 error and said the service could not start due to a logon failure. I then set the service to use the local system account and it started with no problems at all. Is this something to be concerned about?
 
Alright, might as well update again for the hell of it.

The server was on and off all weekend. All weekend a simple reboot would "fix" the problem for a couple of hours. Finally, at 6:30 this morning, it went down and wouldn't come back up. A reboot didn't work, so I attempted to log onto some of the other servers to see if they were ok. Everything was down.

At this point I knew it pretty much couldn't be the Exchange server causing the problem, it had to be something else. And since all the servers are tied to a single domain controller, I pretty much knew where to begin looking.

So in the DC event log, I noticed a few sporadic errors. One about licensing manager not starting, didn't care about that. The other one was about some IRP heap stack something or other that was too small. I did a little research, found it was a common problem, did the fix suggested. That error stopped coming up, but I still couldn't access any of the other servers on the network. When I attempted to go to the file server, I got a message saying that the network resource was unavailable, even though I knew it wasn't.

Now the internet connection for our office is served through our domain controller, and all computers had internet access, so I knew it wasn't a hardware problem. I narrowed it down to some sort of DNS issue. I reinstalled and reconfigured the DNS service (which is what these errors were originally addressing), and everything came back up. So things are working for the most part around here. I've still got some random errors that I'm trying to nail down, so the problem isn't over yet. Just updating in case anyone is reading.
 
We're experiencing much the same thing right now. Was your DNS on the Exchange Server or on your Domain Controller that you had to reconfigure?

Also, were you getting the error that you couldn't login even directly onto the Exchange server itself?

Thanks!
 
Our DNS is hosted on our Primary Domain Controller, not the Exchange Server.

And yes, I was getting the error message that prevented me from logging on even at the console itself this morning. Initially it was just though terminal services, but as of this morning, I was unable to log on to Exchange or our file server even at the console until I had resolved the DNS issue on the domain controller.
 
Thank you!

I don't know if this is good news or bad news since I don't know anything about setting up DNS. But I do have a consultant who set it up long time ago I can call. We installed SP4 on Friday in hopes it was file corruption for the lsass.exe, but since it was down with the same problem this morning we know THAT's not the case!

We'll give the DNS a try like you did.

Thanks for sharing!
Teresa
 
No problem at all. Good luck with your problem, and let me know if there's anything I can do to help. haleon@gmail.com

See ya!
 
I too have been experiencing your exact symptoms. I had a real IT guy come in (I kinda suck) and we did a couple of things that seemed to work; however, it's only been a few hours so I'm not totally sure.

Basically our active directories were out of synch, so far out of synch that they were causing the problems you mentioned earlier (we also had several users who couldn't logon to the domain, because the couldn't contact the domain controller). We had 2 win 2k machines acting as primary domain controllers and DNS. Turns out that the times weren't synched. We resynched the times and set them to use the naval atomic clock. This caused even more problems, as the Active Directories were out of synch, so we set the exchange server to be the secondary Domain Controller and used the other server as the primary domain controller and DNS, removing all DNS functions from the exchange server. The reason this seems kind of hinky to me is that the times appear to have been out of synch for quite a while.

I appologize for my lack of thouroughness here, but the IT guy was moving so fast it was hard to keep up.

I guess the questions I'd have to ask are:
1)Are your servers ins synch (time wise)
2)have you recently been hit by any viruses (cuz we were)
3)have you recently done any windows or exchange updates (cuz we did)
4)Have you recently gotten any error logs reporting that two or more objects have the same account name.

Oh and by the way, when I couldn't log on, I unplugged the network cable from the server, restarted, and then was able to log in. Then I reconnected the cable, that might at least let you in.
 
1.) Yes, the servers are in sync as far as time goes. Keep in mind though that I only have one domain controller.

2.) Well, this is iffy. I had a few lsass.exe reboots which is indicative of the sasser virus. We were never hit by sasser when that was big because our firewall kept it out. Subsequently (and this is my fault) I never installed the security update that fixes the lsass.exe. So once I saw the lsass.exe reboots (which were sporadic, and that's not indicative of sasser) I ran a virus scan and the sasser specific scan. Both scans came up negative. But I installed the security update anyway, so I'm not sure. I probably didn't have sasser, but if I did, I removed it. Might be something you want to check out.

3.) No. when the problems started happening, we hadn't made any recent system updates. As I said in my original post, the closest thing was an install of Symantec AV/F about three weeks earlier. Knowing what I know now, I doubt this had anything at all to do with the problem though.

4.) No. None that I've noticed anyway. I've posted pretty much every error message I've received on the Exchange machine.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top