Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perfmon points to disk I/O

Status
Not open for further replies.

mebenz

IS-IT--Management
Jun 7, 2007
88
CA
I have posted this into the SQL group, but if this is a hardware issue or OS, thought I might try this one as well.

We were having performance issues in all our applications and in order to try and determine if it was an application, the server, the cluster setup, sql setup, etc, we purchased a new, high powered server to test the application that was showing the slowest performance.

The new server has 2x Quad Core Intel Xeon Processors, 4Gb RAM (3 allocated to SQL), Dual NC373i Multifunction Gigabit, Eight 72GB 15K hard drives (2 RAID 1 for OS, 2 RAID 1 for sql logs and 3 RAID 5 for sql data) on Windows 2003 Enterprise. We are running SQL 2000 Enterprise with SP#4.

We ran a test for an hour with all users actively using the application. I ran CPU Object: % Total Processor Time, Memory Object: Pages/Sec counter, PhysicalDisk Object: Avg. Disk Queue Length, PhysicalDisk Object: Avg. Disk Sec/Transfer and Buffer Cache Hit Ratio.

The memory (avg 0.729, max 66.137 only 3 spikes), processor (avg 3.19, max 13) and bugger cache (avg 99.848, max 99.884) are all performing well.

However, the Avg disk queue for the data drive was averaging 5.838, with max of 166.110 (over 20 spikes hitting over 100 on the graph) and the avg disk sec/transfer averaging 0.103, max of 0.757 and graphs are identical to disk queue.

From what I've read, there is something wrong. Even though the application was performing well during this time, the disks are either not performing well, or the application is somehow not performing well?

We are going to be moving our other applications to this server, so if we already have red flags on the disks, it's some concern to me.

Since I am not a hardware gal, nor did I develop the application that is causing these spikes, is there anything anyone can suggest?
 
Just a reminder, RAID1 is faster writing, RAID5 is faster reading. Not sure which your test consisted of...

Your results could also reflect how the various RAIDs are split among the channels available on the controller. Channel saturation can play a big part in performance.


"We must fall back upon the old axiom that when all other contingencies fail, whatever remains, however improbable, must be the truth." - Sherlock Holmes

 
Which drive was the queueing showing on?

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog
 
On our data drive where our sql data resides.
 
First do you have a SCSI or SAS drive subsystem ?

Clustering....

The drive's internal cache

Another possibility (most likely)...
I was under the impression. with clustering, the "write back" cache is turned off on raid adapters by default, some controllers allow "write back" to be enabled other do not on a cluster setup (I am not referring the disk's internal caching as referred to in the above KB); if "write back" is not enabled or not allowed on you cluster raid adapter there is no way you will have decent disk throughput,under ANY circumstances.

If this not so......
"... the disks are either not performing well, or the application is somehow not performing well?"

Personally I would not setup a raid 5 with only 3 disk, with an eye on performance... a starting point of 4 disks is needed. Secondly a raid 1 for the OS and a separate one for the log files..I don't see any performance gain from this, you would be better off with a 4 disk raid 10 for both the OS and log files, as raid 10 is so much faster than simple raid 1, even with 2 separate spindle sets.

To rule out the disk subsystem, I would benchmark the disks, keeping the clients and network out of the equation.
Network device issues, wire errors, poorly setup wks software, SMB signing issues and a dozen other problems could be causing delays in data getting to and from the disk controller.


........................................
Chernobyl disaster..a must see pictorial
 
It's recommended to keep the transaction logs and OS on seperate arrays. This keeps the page file and the transaction logs IO seperate from each other.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

My Blog
 
It's recommended to keep the transaction logs and OS on seperate arrays."
Aware of this...Specifically the pagefile/tmp files ( of the OS on a separate spindle set issue) and the transaction logs were considered in my answer.



........................................
Chernobyl disaster..a must see pictorial
 
Are there any tools recommended to benchmark these disks?
 
For a simple benchmark, like you, I time a copy from a source disk to a destination disk, and in the reverse. For a source I fill a directory with average sized files, not one large file.

Simple old DOS based benchmark, does not add files to your server.I generally do not add benchmark programs to a client's live production server which adds files to the system. Find this very useful. Cosbi ver 0.52.


HDtach, only good for files reads, I only use it to compare relative performance differences between servers, not accurate but will give you a good idea if one server's disk system is faster than another.


IOmeter, but it takes getting use to, you need to understand the parameters before the benchmarks have any true meaning,


Note concerning raid 5 benchmarks....
I have yet to find a benchmark which show true raid 5 performance, with small networks in mind (>75 users). Raid 5 performs much better then benchmarks show on small networks, as all benchmarks flood the cache, even IOmeter (originally devloped by Intel) which you can adjust; most small network servers do not constantly overload the cache in normal use.

........................................
Chernobyl disaster..a must see pictorial
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top