Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Avg Disk Queue Length Very High on Exchange Server 4

Status
Not open for further replies.

stevenriz

IS-IT--Management
May 21, 2001
1,069
0
0
Hi all. I have a Dell 1U 1950 server with mirrored drives and notice some performance decrease on the server... I look at the Avg Disk Queue Length and it is pegged at 100. So I increase the display to show 5000. Disk Queue Length is averaging 400-500 but spikes to 5000, even more! Is this normal or is my performance bottle neck elsewhere? CPU is low as is pages/sec.

I read some posts that say throw more spindles at it... My second question is this. I have an EqualLogic iscsi array on the network now. I configured an exchange volume and was planning on putting the exchange information store onto the array. would this solve my disk io problem do you think? it is essencially throwing disks at it since it is a 16 disk array.

thoughts?

thanks!
Steve
 
Do you only have a single RAID1 or do you have the OS, database, and logs on different array's? We use three EqualLogic arrays for our SAN environment and run Exchange off of it (2 array's in RAID 10 and one in RAID 50) and we see great performance. XMSRE is the performance guru so if you want the best answer possible wait for his response.

I hate all Uppercase... I don't want my groups to seem angry at me all the time! =)
- ColdFlame (vbscript forum)
 
It's far more accurate to use the sec/read sec/write reads/sec and write/sec counters. From reads/sec and writes/sec you can determine your read/write ratio and overall IO rates. From sec/read and sec/write, you can determaine your response times or how well the storage is handling the load you're placing on it. If you see average response times greater than 20ms or peaks lasting more than a few seconds over 50ms, then your IO subsystem is performing poorly.

In most cases, Exchange 2003 has a read/write ratio of about 2:1. RAID 10 has a write penalty of 2. RAID 5 has a write penalty of 4. The rule of the thumb is; If your write penalty is greater than the read/write ratio, then the RAID type is a poor choice for your application.

I'd go with RAID 10 for the databases.
 
thanks guys! the equallogic box has two volumes as part of its storage pool which is raid50. One volume is 7TB for general file storage on one server and the other volume is 1.5TB which we planned to use for the Exchange store on another dedicated server. we weren't going to get too fancy with having logs in one location and the db in another. we have about 100 users. should I rethink anything? please note we don't have any money to make any hardware changes so we have to make do with what we got.

 
Are your clients cached? Are you running BlackBerry? Any desktop search engines? Are you doing anything that would put you above 1 IOP/user?

If it's only 100 IOPS, then the load is small enough that it really won't matter what you use. In that you are experiencing problems, I would assume that this is not the case. Figure out what your IO load is, then you'll have the information you need to configure your storage in a way that will support that load.
 
Yes all clients are cached but maybe 20 have blackberries. No desktop search engines that I know of. Here are our results on the performance monitor scale...

avg disk sec/read = 25 scale=1000
avg disk sec/write = under 5 scale=1000
avg disk reads/sec = 150 but spikes to 500 scale=1
avg disk writes/sec = 50 but spikes to 500 scale=1

do these numbers look bad?
 
IMHO, I would at least create and present separate LUN's for each OS, Database, and Log (and RSG if needed down the road) to ease any kind of LUN contension that may creep up if you have all of them on one.

I hate all Uppercase... I don't want my groups to seem angry at me all the time! =)
- ColdFlame (vbscript forum)
 
1. Your average latencies are 25ms for the sampling period. Your current disk subsystem is performing poorly.

2. Your measured read/write ratio is 3:1


You don't say what type of disks populate your EqualLogic array, so I'll give some typical IOPS/Spindle numbers for various types of disk. These numbers assume 8K random IO at a response time of 20ms or less. In addition, they take into account the overhead of the NTFS file system.

15K SCSI 155
15K SAS 145
10K SCSI 90
10K SAS 82
10K SATA 42
7.2K SATA 28

Put the OS on a Mirror.

Your Database requirement is 150 reads + 20% and 150 writes + 20%. How that translates into spindle count depends on the tpye of disk and the type of RAID array.

If P is the performance in IOPS of a single spindle, and N is the number of spindles in a RAID array,

For RAID 5
Write Performance = P*(N-1)/4
Read Performance = P*(N-1)

For RAID 10
Write Performance = P*N/2
Read Performance = P*N


If I were to use RAID 5 with 10K SAS spindles, then:

Through the magic of algebra I maniupate the formula to solve for spindle count (P);

(Reads + writes*Raid penalty)+ 1

and add my 20% by multiplying the result by 1.2

150+50*4 = 350*1.2 /82 = 5.121. Round up cause I can't have an eighth of a spindle and add the one parity spindle, 6+1=7 spindles required for the DB IO.

If I were to use RAID 10, then;

again restructured to solve for spindle count

reads + writes*2

and add my 20% by multiplying the result by 1.2

150+50/2 = 250 * 1.2 /82 = 3.65 rounded up to the next whole spindle = 4 spindles required for DB IO.

RAID 5 requires 57% more spindles than RAID 10 when you have a 3:1 read/write ratio. The lower the read/write ratio, the worse it gets. This is why you follow the rule of thumb: If the RAID penalty of your proposed RAID type is higher than the read/write ratio of your application, than the proposed RAID type is inappropriate.

Your log IO in Exchange 2003 is about 1/8th the DB IO. A mirror should easily handle this. Given all the above, and sticking with the 10K SAS drives in the example, I'd go with:

A mirror for the OS
A mirror for the logs
A 4 drive RAID 10 array for the database.


You'll need to figure out what kind of drives you have and rerun the math with the appropriate IOPS/spindle figure.



XMSRE













 
XM, it never ceases to amaze me how you can know that much detail about this stuff. Star for you just for that.

I hate all Uppercase... I don't want my groups to seem angry at me all the time! =)
- ColdFlame (vbscript forum)
 
thanks XMSRE, excellent information!

we have 16 HUA721010KLA330 1TB 7200rpm drives in the array, the entire array has already been setup to RAID 50 with 2 volumes, one of which is dedicated to Exchange.

Knowing we don't have any money to spend, I think our only and best option here is to leave the OS on the C: drive mirrored, the logs on the D: drive (mirrored) and move the db to the EqualLogic...

thoughts?

 
Ok. Knowing the configuration is not optimal, let's see if it could work.

150+50*4=350*1.2=420/28 = 15 +1 = 16. I believe you initially stated this is a 16 disk RAID 50 array. If you put nothing else on it, Exchange will require the entire IOPS capacity of the Array just for DB IO.

OS goes on a mirror, check.

I need somewhere for about 30 IOPS worth of log IO. Either place, with the OS or with the DBs) is not a good solution. I believe you will continue to see performance issues.

 
darn, I am hopeful we will get a little better performance out of it though.

is the db move a trivial task? I will have to start poking around on procedures. I've seen that it might be able to be done within system manager...
 
I'd just do the move with system manager. It's straight forward, click through the GUI.

How many drive slots do you have in the server? If you could add a mirror, or just a single drive (knowing you lose some fault tolerance for the logs), you could put the logs there and get by.

SATA disks provide a lot of space and fairly good sequential IO performance, but the random IO performance is poor. When you size storage, space is only one cosideration. As you have experienced, performance is often just as important. Your disk and raid selections need to meet BOTH requirements in order for the storage system to successfully solve your issue.

Way too often I see a situation where a customer has only taken space into account when buying storage. Often times, storage vendors don't take the time to explain the ramifications of the decision and simply fill out the sales order. In my mind, this is doing the customer a grave disservice. All I can do at this point is say, "Next time around ...." You're basicly stuck. You'll have to spend to fix the problem at this point. Next time around, make sure you take both the space and performance requirements into consideration before purchasing storage. I'm really sorry that you had to learn that this way.

XMSRE


 
I hear ya, thanks XMSRE. we do not have any extra disk slots in that server. I will be decommissioning a poweredge 6850 soon though that has 5 drive bays in it I think. maybe I will build it a little better then this one was.

let me ask you this. can you run different exchange functions on different servers? I have many 1U boxes that are new and not in use right now.
 
You can do an FE/BE design in Exchange 2003. You can have more than one mailbox server as well.

 
xmsre,

How did you come up with such low IOPS? Everywhere I have read so far it was 70-90 for SATA. I have never seen it in 20's. How close to real world/average exchange is the 8k random test?

How does the NTSF overhead come into play with regards to exchange server?


 
Do you have any Entourage clients on your network? I had the same problem and it was a Mac using Entourage 2008. This is a known problem that a single email in Entourage can cause this problem. Microsoft currently has no solution to the problem. One way to fix it is to archive all the email on that account to the local hard drive. You also may need to delete that account in Exchange and start a new mailbox for that user.
 
Getting ready to move the exchange db and logs to the EqualLogic... right now the server's IO is huge.

In Performance, I raised the graph to 10000 and here are the results...
- Avg Disk Queue Length hits that and then some,
- Avg sec/read hits 3000
- Avg sec/write hits 4000
- reads/sec is low
- writes/sec is low

This is with nobody in the office but me and all I am doing is exporting a former employee's mailbox. CPU is basically idle. I am thinking the daily maintenance on exchange is happening but I can't prove it...
 
3-4 seconds per read or write is a huge latency. What exactly do the reads/sec and writes/sec look like? What type of disk, and what is the layout?

If this is SATA, depending on the drive you can expect between 20 and 60 IOPS @ 20ms response time for random IO. If you're seeing something outside that range, you'd need to investigate further. What does the backend look like? Is there any chance of comingling? What CPU type are you using? Could this simply be a timer drift issue? I think more information is needed.

XMSRE

 
the reads/sec were avging about 200 on a 1000 scale and the writes/sec were about 100 on the same 1000 scale. Had to reduce it to 1000 from 10000 to actually see what they were doing. It is a two disk raid1 on a 1U poweredge server. dual core dual CPU, 8GB ram although I think Exchange 2003 can only use 4GB. Anyway, I have offloaded all the print queues that were on it and also archived about 20 mailboxes from former employees where the sizes ranged from 100MB to 3GB and the server seems to be coming around. I still plan to move the DB to the EqualLogic though...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top