Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Exchange 2K3 cluster questions 3

Status
Not open for further replies.
Sep 6, 2000
183
Hi,

We are currently having huge problems with our exchange system being maxed out despite a really fast server, and it's looking like disk / database access problems.

The server is san attached so we can't do any more than we have already other than regular offline defrags.

What I was thinking was that we could move to a cluster, perhaps 3 nodes in the cluster and one front end server for smtp and owa.

My questions are as follows.

1. Will I see a general speed increase bearing in mind the disks are currently getting hammered on a single server by clustering, or will the disk access problem just appear on the other nodes?

2. Can I do an offline dfrag on the database on one node without bringing down the information store, ie can I remove a node from the cluster defrag and bring back in to catch up?

3. How do others manage to do offline defrags, do you find you have to do them regularly, we are seeing that it is taking about 5 hours a store to do which is becoming unacceptable to users.

Any help, advide or general comments are extremely welcome.



Chris Styles

NT4/2000 MCSE
 
I'm not sure a cluster will solve your problems if you are storage constrained - clusters generally solve processing bottlenecks.

Give us some numbers for your current server - CPUs, mailboxes, concurrent user connections, RAM, number and size of DBs, etc.
 
Okay, thanks.

Exchange 2k3 Enterprise on a 2K Advanced Server

HP DL580 G2 4x Xeon 2.5Ghz 8Gb RAM (RAM Configured as MS white paper)

One Domain, 750 mailboxes, 5 db's (9Gb,32Gb,34Gb,74Gb,32Gb)

That's 4 private stores and 1 public store in 4 Stoareg Groups.

Disk config is 0+1 for the os and log files and Raid 5 for the databases and steaming files. Each store has it's own drive on the SAN for log files and datbase / STM files.

All users should be using Outlook 2k3 in cache mode, and we see a serious delay opeining other users calendars, sometimes as long as 15 seconds.

We jounal all mail to a journaling mailbox and have about 250 blackberry users (BES v4)

Any ideas where to go next?





Chris Styles

NT4/2000 MCSE
 
Your server is seriously over-specified. You have twice as much RAM as Exchange 200x can address (32-bit address space limit is 4GB), even with 'only' 4GB, with quad processors you could handle 4-5 times as many mailboxes.

The journaling is probably putting a serious load on the server, though. And are your Blackberry users included in the 750, or extra?

I'd say you need to check your SAN, it sounds like that might be your bottleneck. RAID 0+1 is recommended over RAID5 for higher performance for DBs on Exchange 2003.

There are quite a few articles around on monitoring and tuning an Exchange 2003 server, I suggest you spend some time analysing its behaviour in detail.
 
Yes, realise the memory is too great but it came in a deal.

Think the San is seriously overloaded, I have been trhorugh all the tuning and monitoring articles I can find.

The reason I think we have problems with fragmentation is the journal mailbox. We constantly write data to a mailbox and constantly delete this mail as it is archived into Enterprise Vault.

What I though I might then do is to create a new storage group and info store (I'll have to consolodate one first). And have only the journal mailbox in that store, this should allow me to dismount and remount it as and when I require to offline defrag it.

I'll look at changing the SAN over then 0+1, question is can I user the diskpart tool to align the disks?

Also it looks like although the 250 blackberry users are part of the 750 BES has a massive impact on constantly scanning the mailboxes as well.

Chris Styles

NT4/2000 MCSE
 
Just had an exchange consultant take a look. And his advice was as follows.

Change all drives to raid 1 and use as many spindles as possible (I am going to use 8 for the datbases and 6 for the logs)

Move the journaling mailbox to a new store on it's own. He recons journaling increases the load by the number of journaled stores -1 so Journaling is tripling our server load.

Create new stores on the new drives and move users over to aleviate the current fragmentation issue.

Remove 4Gb of RAM from the server and apply exchange SP2.

Interestingly enough he recons that upgrading the OS to Windows 2K3 won't help matters.

Boot.ini and registry set correctly.

Now I just need an answer about the diskpart tool.

Chris Styles

NT4/2000 MCSE
 
Interresting - so the consultant basically agrees with me. How much did he cost you?? ;-)
 
Oh enough, don't worry about that.

Funny thing is, you say to your boss, look I have it all in hand just give me the funds to do the upgrades etc. They um and are about giving you the cash and then tell you to get an external consultant it to tell you what you already know.

I'll never get the director thing!

Chris Styles

NT4/2000 MCSE
 
Chris,
Windows 2000 memory management for Exchange is pants. You should have one storage group. Upgrading to Windows 2003 improves memory management massively and MS recommend moving to one store per SG at that point. This will speed you up no end.

Additionally, opening calendars is probably taking time due to GCs. For 750 mailboxes I'd say you need 2 GCs and neither of them should be on that box. Have you got 2?

Certainly spindles will help. That 580 I think supports RAM mirroring. If it does, mirror the RAM so that 8GB gets seen as 4GB then you haven't wasted anything.

Did I understand you've got TLs and Stores on the same volume? That will kill performance too.
 
Thanks for the info.

We currently have about one store per storage group. Upto the maximum storage groups. Interesting point about 2k3, though I dont fancy the OS upgrade going wrong and having to re-install exchange.

I have three GC's one is on a remote site connected via an ip site links so should not count. The exchange box isn't a dc or gc. I am going to be replacing our DC's with dedicated DC's. Currently they are Fileservers as well.

Interesting point about ram mirroring, suppose that would give failover.

No TL's and Stores are on different volumes, Raid 1 for TL's and currently raid 5 for Stores. (soon to be raid one when I have beaten SUN over the head about the disk prices)

Any idea about if I should experiment and move the STM files to different locations as well?



Chris Styles

NT4/2000 MCSE
 
If you have the spindles for it, by all means move the STM. Though remember that STM files are only really used for OWA and IMAP4 so you may not use them a lot.

Check whether Exchange is looking to the remote GC - might explain lag. Remove it from the list on Exchange and see if that fixes some user issues.
 
I know this is pretty basic, is the infrastructure master (FSMO role) located on a GC? Seems like you know your stuff but this might be worth checking.

If you do have a go at upgrading the OS, eject one of the hotswapable disk in the OS mirror. You will have a easy backup if you have issues. Then once you finish successfully then just re-insert and wait the 15-30 minutes for the mirror to rebuild. If not, power down and eject the 1 remaining in and re-insert the first ejected disk.

One thing that you might want to think about is in a Windows 2003 domain (you haven't mentioned the domain specifics / level you have), you can enable universal group membership caching for a site. As Exchange requires the Global Catalog servers to check Universal Group membership this might be something worth concidering also.


Just a couple of fundermentals that may be of use. Goodluck




"Assumption is the mother of all f#%kups!
 
This is excellent stuff.

Yes the infastructure master is on on of the dc's as the gc. It has to be though as we only have two dc's at the primary site, both of which are gc's.

I'll take a look at the gc referenced by exchange just to be sure, as per Andreh's tag line about assumptions!

Like the idea of ejecting a drive, sounds too simple to be true!

Domian is still mixed mode Windows 2000, I haven't comverted to native as it seemed a pointless exersise, though I would avaoid group caching for exchange as we frequently have to change distribution group memberships on the fly for users.

I thought I would include some data from our perf logs for consideration. All values are average.

RPC Requests are low, about 5 per sec, however RPC Operations are at about 250 per sec.

Free System Page Table Entries are at 11900 which MS recon is too low.

Log stalls are low at 80 max, but an average of 1 (possibly caused by a SAN delay)

Message Opens are around 30 per sec.

Total disk transferres are averaging 690 per sec.

Store % processor time avergaes 60 but interestingly enough % processor time for the total processor averages at around 17% (not too sure how these two tie up)

Private Bytes for the store = 1.2GB and steady.

Virtual Bytes for the store = 2.19Gb and steady.

Anyone know what to monitor wrt AD/CG Lookups?







Chris Styles

NT4/2000 MCSE
 
As ZBnet mentioned a bit earlier, you server is very quick from a hardware perspective.

Do a bit of auditing / health check on the domain. Dcdiag and netdiag would be a good start. Dcdiag in verbose mode (/v) to extract everything would be best (eg c:\>dcdiag /v >DcDiag.txt if I remember correctly). I use this tool for audits at customer sites and it is excellent.

I would try removing GC functionality from the infrastructure master DC for a while. Being only 1 domain / forest, these being on the same box shouldn't be that big an issue. If you only have 1 GC at that site it should be enough while you test results. If the DC with CG functionality goes down you will have a bit of an issue but you could pretty easily react to. Not ideal, but in order to resolve you issue it would be worth a go.

On the DFL (Domain Functional Level), I would suggest you do consider upgrading it. 2000 mixed mode is needed with NT4 domain PDC's and BDC's. You don't have any so moving to 2000 Native mode will not be an issue.

If you upgrade to Windows 2003 DFL from 2000 Native mode, incremental GC replication from the will improve replication efficiency for the remote site and reduce replication traffic. When GC replication occurs in 2000 Mixed and Native mode the entire GC is replicated.





"Assumption is the mother of all f#%kups!
 
Have run dcdiag with the /v and /e switches. All seems fine.

Understand about mixed and native, just didn't see the point. I'll do this tonight.

As for the windows 2003 DFL I can't go cap in hand for 650 windows 2k3 cals at the moment. So we have to use what we have.

I'm going to progress with the disk reorder, storage groups reorder etc. And we will see, then I think we will rebuild all the dc's (doubt this is going to help) and install a frount end server for owa and smtp. OWA is not very widely used, but I want to wean the users from blackberrys onto windows mobile devices using exchange sp2. So suppose it makes sense having another server take this load.

Chris Styles

NT4/2000 MCSE
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top