Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Exchange Freezes Server

Status
Not open for further replies.

Designware

Technical User
Sep 24, 2002
202
We have an Exchange 2003 Server that freezes up periodically. It can freeze up 6 times a day, or once in two weeks. It's very sporadic.

We have tried running isinteg on the database, and also ran defrag on it, but it still freezes.

If we dismount the information stores, the server never freezes. It is only when the store is mounted that the whole server freezes. It can freeze at 2 AM in the morning when no one is using email, or during the day.

How can I trace the issue?

Thanks.
 
Please define freezing.
Your users are disconnected from their mailboxes?
You can't use the GUI on Exchange server?
CPU shoots up to 100%?
 
ntinlin,

Thanks for the reply.

Freezing encompasses:

1. Users are disconnected from their mailboxes
2. Cannot use the GUI on the Exchange Server (cannot get it to ever go off of screen saver ... have tried turning screen saver off also)
3. I don't believe the CPU is at 100% ... because if it were ... the system would eventually respond ... albeit slowly. I cannot get back to the GUI no matter how long I try.

I do not see errors in the System or Application logs prior to the shutdown. There are some errors during the subsequent reboot saying it's trying to clean up the dirtly Exchange shutdown.
 
What's the underlying storage like? Any System log errors from the source "Disk"?

I doubt Exchange is corrupt, but it's just the thing that creates the most I/O intensive activity on the server, so you may have an underlying disk issue that is triggered by Exchange activity.

Dave Shackelford
ThirdTier.net
TrainSignal.com
 
Hi ShackDaddy,

Thanks for the reply.

Storage on server are two sets of RAID 1 SATA drives. We did run a chkdsk on both and told it to repair the errors. There are no system log errors from the source "Disk". The edb files are on one RAID set, the stm files on the other.

The only errors (other than the 6008 unexpected shutdown errors) are event ID: 8032 "The browser service has failed to retrieve the backup list too many times." Other than that, all the entries in the system log are of the blue "informational" variety.

Also, we did run extensive hardware diagnostics software that we received from Dell that included tests on the hard drive, memory, and other hardware devices. The results from that showed no hardware issues discovered.
 
Have you done any performance monitoring, specifically looking at disk access? Does the OS have a separate drive, or is it one of the two mirrors that you mentioned?

I would look at Physical/Virtual disk counters, specifically at the read/write queues. If they are high, that means that there are a lot of writes/reads waiting. The number shouldn't be above 3 or 4 normally.

Dave Shackelford
ThirdTier.net
TrainSignal.com
 
No, we have not done any performance monitoring. Do you have suggestions on the tool to do the monitoring? I found one for Windows Server 2003 from Microsoft here:


If this will work we'll d/l and install it. If there's another preferred, let me know.

The OS is one of the two mirrors mentioned. The stm files are on the same mirrored drive as the OS.

Thanks.
 
You could use the Performance Advisor in that link or you could set up your own counters using the native Performance Monitor tool. They key thing would be to set up a collection set that regularly saves to disk and grabs stats around every 30-60 seconds. That way after a freeze you could restart the server and look at the logs to see what the resource situation looked like. What I think you will find is that in general, you have disk i/o problems. Not due to disk errors or any corruption, but simply due to the underlying disk subsystem not being able to keep up with the demand. The VirtualDisk read/queue and write/queue counters are probably going to be the most informative.

The Exchange data is in three parts: .edb, .stm and .log files. The first two are usually together, and the .edb holds the actual mail. The .stm file holds a copy of all attachments that have been converted to a format that POP/IMAP and some other clients would need them to be before downloading. And then every transaction that happens gets written to a .log file before it gets committed into the .edb file. If your backups are working correctly, every time a full backup happens, all the .log files get deleted automatically, but you wouldn't ever want to delete them yourself unless you knew exactly what you were doing.

Dave Shackelford
ThirdTier.net
TrainSignal.com
 
ShackDaddy,

FYI, haven't forgot about this thread. Our Exchange server happens to be in a good period right now. As soon as it freezes again, I'll post back here.

Thanks again for your help.

Dale
 
Well, as soon as I say we're going through a good period, the server freezes. We had the performance monitoring tool up and monitoring when it froze.

You are exactly right, just prior to the freeze the read and write queue jumps significantly. (It did have many similar spikes prior to that with no freezing of the server.) If the disk subsystem can't keep up, what are my options short of a new server?

During the time we've had these issues, I have reduced the size of the edb data file (by having people mass archive emails and doing defrags) from almost 20 GB down to about 7 GB. I would think we should have less disk activity by simply not having to access and show as many emails as we did before. Obviously new activity probably is increasing.

Should I place the stm files on the same drive as the edb files? I separated them for better performance. Also, our backup software must not be working properly. We get a full backup of Exchange (brick level) every Friday. (We have done restores with it, so the actual backup is working.) However, none of the logs are deleted. We utilize ArcServe backup. If we don't delete them they completely fill the drive. So, once every 4 months or so we delete the old logs. Could this be causing the issues?

Finally, as I watched the store.exe process within the task manager, the amount of memory that store.exe grows. After I reboot the server, it starts out at about 30,000 KB. It continues to gradually grow to above 200,000 KB. Is this normal?

Thanks! I really appreciate your expertise.

Dale
 
The sheer number of logs is certainly causing fragmentation, which increases the IO required to read/write to disk.

The .stm file probably generates the least IO load. It's the .edb and .log writing that generate the load, and they are on the same disk.

First, by looking at the queues, which of your volumes is getting most hammered? That's the one that's causing the freeze. Is the freeze affecting the entire server, or just Exchange? If it's the entire server, then it's probably your OS volume that's having the issues. If it's just Exchange, then maybe it's the .edb and .log volume.

I would add another mirror to the server and just put the .edb and .stm files on it.

Basically by choosing to use mirrors, you've basically chosen to focus all your disk IO on single slow SATA disks. The only way you can fix this is by either adding another mirror or by backing up your second volume and then adding a couple more disks, reformatting and creating a RAID5 and then restoring the data back to the new volume.

Dave Shackelford
ThirdTier.net
TrainSignal.com
 
Yes, it is the OS volume that's getting hammered. That volume has the stm and log files. The non-OS drive has the edb files. The OS drive has about 18 GB free, and is fairly fragmented. I can run defrag again on the OS drive.

It is the entire server, not just Exchange, that freezes.

The system just froze again, and it was memory that had the spike. There was a much smaller spike in the disk queue at the same time, but the large spike was with memory.

I will move the stm files over to the other drive and see if that helps us at all.

Thanks again. We'll see what happens.

Dale
 
Wait, I would move the .log files over to the non-OS drive. The .stm file isn't generating that much activity, but having your logs on the OS drive is definitely a problem.

To get rid of the logs:

1. Stop the Information Store service. (this will dismount all databases)
2. Delete the log files and the .chk file.
3. Start the information store service up again.

Now relocate the log files and it will go very quickly since there aren't very many.

Dave Shackelford
ThirdTier.net
TrainSignal.com
 
I think we've discovered the issue. Previously, we had turned off the antivirus monitoring software, but the issues continued. We actually uninstalled the entire antivirus client (Sun Systems, Vipre) last week and the freezing issue has not been a problem since that point.

Thought I would report back as to what the issue ended up being.

Thank you for your help. It is appreciated.
 
That'd do it. Glad you got this resolved. In my experience turning AV off rarely resolves an Exchange issue--only complete removal usually makes a difference, due to the way it inserts itself into the architecture.

Dave Shackelford
ThirdTier.net
TrainSignal.com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top