Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to interpret disk performance data?

Status
Not open for further replies.

chuckmaginessfabals

Technical User
Mar 29, 2011
3
0
0
GB
Hi. I'm logging performance counters on a 2008 R2 server, and I'm after a bit of advice on interpreting the results from the disk counters, if anyone can help. Sorry if what follows seems convoluted, but I'm confused by a lack of clear rules by which to interpret disk counters reliably.

Most of the MS stuff I've found just reiterates, and elaborates on, the 'Explain'/'Description' text, but one non-MS posting I found casts doubt on the reliablility of seemingly key counters like % Disk Time, Current Queue Length and Average Disk Queue Length. For example:

% Disk Time
===========
I've read that this counter is 'capped' and therefore, does 'not actually measure disk utilization'.

I also find it returns figures of several hundred per cent where RAID is involved, and I'm not sure it's as simple as dividing that by the number of disks in an array to get a meaningful figure.

Current Disk Queue Length
=========================
This counter is, apparently, unreliable, because, 'If requests are queued in the hardware, which is usual for SCSI disks and RAID controllers, the Current Disk Queue Length Counter will show a value of 0, even though requests are queued.'

Avg. Disk Queue Length
======================
This counter, I read, is derived from Avg.Disk sec/Transfer and Disk Transfers/sec, and requires an 'equilibrium assumption' to be factored in, namely, 'that the arrival rate equals the completion rate over the measurement interval. Otherwise, the calculation is meaningless.'

The corollary of this, apparently, is that the Ave. Disk Queue Length Counter value should not be accepted as reliable except where the current value of Current Disk Queue Length is the same as the previous value of Current Disk Queue Length.

In a recent log, the only instances of this were where the current and previous values for Current Disk Queue Length were 0 (though other values were recorded at other times). Given that 0 is supposedly an unreliable value for Current Disk Queue Length, does this render the Avg. Disk Queue Length values for these intervals meaningless?

Any advice on how to interpret these (and any other) disk counters to get meaningful figures on disk performance (specifically, whether the disk is a likely bottleneck) would be greatly appreciated.
 
the reliability of seemingly key counters like % Disk Time, Current Queue Length and Average Disk Queue Length"
With many of the "counters", they can be inaccurate due to delays produced by programs/network issues accessing the disks. Program or network issues cause the disks to wait or Que, so basically the counters will show disk performance issues.. but it has nothing to do with the disk subsystem much of the time. Basically you need error free programs and networks to produce reliable results. I have seen countless posts of users blaming raid adapter or disks for poor results, 95% of the time it little to do with the disks, unless there is a firmware issue, the recommended raid policy is not used, antiquated slow hardware is used, or the server is pushed beyond it's hardware limits.


........................................
Chernobyl disaster..a must see pictorial
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top