major performance issues, any ideas? 2

allanh1 · Jul 13, 2005

Guys, Sorry if this is a bit long but please bear with me.

We have 2 clusters that both have approx 2500 users on them. They are configured in an active/passive configuration. Each box has 4 processors and 4GB of RAM, Paging files are set up for 3GB on the C:\ and D:\ and the transaction logs and 10 mailbox stores are all on different drives in an external san. They are running Windows 2000sp3 and all the hot fixes, Exchange 2000 SP3 with the latest rollup. We Have KVS archiving installed, adn run netbackup for the backup solution.

We have been getting a lot of the dreaded Requesting Data issues but only on these servers.

When running performance monitor at the time when it happens the RPC requests climbs rapidly and peaks and the RPC operations drops immediatly to 0.. Obvioulsy if this carries on for 300 seconds the cluster service fails the server over.

I have tried everything that i can think of from my past experience with exchange. We have increased the msEXCHparamlogbuffers to the maximum recomeneded. i have increased the number of ldap queries the server caches to 5000, have made several registry cahnges to try and improve performance of these boxes but nothing seems to help. We have monitored the netowrk and the external storage and these do not seem to be causing the bottle necks that i would expect to cause these issues.

If anyone has any ideas at all they would be most welcom ive come to the end of what i can think of.

Many many thanks in advance,

Allan

Zelandakh · Jul 13, 2005

have you looked at GC and other FSMO roles?

allanh1 · Jul 13, 2005

well have checked as much as i know about them., It uses 3 dc's that sit about 10 feet away and has a list of over 11 GC's to use what else would you check?

xmsre · Jul 13, 2005

Inadequate disk. In perfmon, physical disk - sec/transfer and database - log record stalls/sec. Take that once a minute for a day or so for your log drives. when you view it, notice the correlation between slow writes and log stalls. A slow write to the logs is one that takes longer than 10ms (.010). Log stalls shou be less than 1. The closer to zero, the better.

SAN is meaningless here. Just because the storage is on a SAN, doesn't mean it's properly sized. Many SAN vendors create one big disk group and carve all the LUNS out of it. This can create a situation called comingling where IO against one LUN impacts the performance of other LUNs that share the same physical spindles. Who is the SAN vendor?

allanh1 · Jul 14, 2005

Thanks for the help so far. The San is an EMC storage solution.

Ill set up the monitoring for the transaction log drives to make sure that they arent running slow. I have been monitoring the log stalls, but not the disk transfers, And the log stalls very ocasionally peak at 10 or something but never really move off of 0.

Thanks for any morehelp!

allanh1 · Jul 14, 2005

the only other thing i have noticed is that in the directory access tab for the servers in question, The GC's have been configured manually and about 12 - 15 servers listed in there. Would this not be better off being set to automatic? Cheers.....

xmsre · Jul 14, 2005

Unless there is a specific reason you do not want to use a GC that would otherwise be selected first by the default logic (in the same site).

xmsre · Jul 14, 2005

EMC - it figures. Clariion or Symmetrix?

If log stalls correlate to slowdowns on writes, then the issue is insufficient disk and the only real solution is to add spindle count. If they don't then likely large attachments flowing through the system repeatedly dump the log buffers. By default, there are 84 512 log buffers in RAM. That's 42K. When the buffers fill to the high water mark (95%) they start flushing to disk until the low watermark is reached (5%). If they cannot flush fast enough and fill to 100%, exchange halts all client IO until they can be flushed - a log stall. 10 is a bad number. If the issue looks like large attachments, meaning there is no correlation to slow disk times, then you can increase the number of log buffers. In that you already stated that you increased log buffers to 512, and did not get any improvement, I'd have to stick with inadequate disk IO/spindle count.

allanh1 · Jul 15, 2005

Xmsre,

Thanks for the information. Its an EMC Clarion are there known issues with these? EMC have looked and said there isnt a problem with the storage? But From waht you have put above it looks like that could be the issue.

Would you have any other suggestions?

Many many thanks.

xmsre · Jul 15, 2005

What would you expect EMC to say? The issue isn't the hardware, it's the way the raid groups/luns were layed out. There is not enough IOPS capacity to support the load your exchange configuration is generating.

Disabling last accessed updates will give you a quick 20% or so performance increase for a 4K IO size. Whenever a file is accessed (read), the metadata associated with the file is written (last accessed timestamp is updated). Disabling lastaccessed updates will decrease IO by not updating the timestamp on every read (the last changed timestamp - writes - still gets updated). If you're close, it might be enough to get you through this.

Copy fsutil.exe from an XP workstation to your server (it ships with XP).

fsutil behavior set disablelastaccess 1

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

major performance issues, any ideas? 2

allanh1

MIS

Zelandakh

MIS

allanh1

MIS

xmsre

ISP

allanh1

MIS

allanh1

MIS

xmsre

ISP

xmsre

ISP

allanh1

MIS

xmsre

ISP

Similar threads

Part and Inventory Search

Sponsor