Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Performance degradation when upgrading BF to CO policy in PowerPath

Status
Not open for further replies.

ozihcs

IS-IT--Management
Jan 10, 2005
12
NO
Recently we upgraded our PowerPath licenses on several systems , including a Legato Networker backup server with the AFT backup-to-disk device option. The backup server has 3x1TB LUNs from a 14-disk ATA RAID5 group.

This weekend our backups were running at a severely reduced rate compared to normal, and typically this happens when there are problems with the storage system.

In Navisphere Analyzer I could see that utilization was constantly reaching 100% for the LUNs used by the backup to disk drivers, whereas IO/s were quite low compared to normal operations.

Furthermore, powermt and iostat revelealed a large queue of outstanding requests on both active paths to the LUNs which led me to suspect that somehow the loadbalancing was the culprit.

I reverted the policy to Basic Failover and immediately saw a doubling in IO/s and a massive increase in the throughput of the running backups.

I cannot exactly explain just why this became such a big problem, but I suspect that the parallelism may have led to spindle contention. On the other hand, IO ops/s actually doubled when I reverted to BF mode, and disk utilization was never higher than 40% in CO mode, so this may be an incorrect assumtion.

Alternatively, I may have run into a problem related to throttling or queue depth in the OS-level.

I should mention that this host has only one HBA and can see both controllers on both SPs in the switch zone. In loadbalancing mode, this means that the same local HBA is used to queue up transactions to both controllers on the currently active SP - and I guess this may have consequences for how I should configure /etc/system.

Anyway, I am a bit worried as to wether this problem may also occur on some of our more important san-attached servers. If there are parameters I need to tune in /etc/system or similar, I'd very much like to hear about it.
 
I should mention that this host has only one HBA and can see both controllers on both SPs in the switch zone.

did you have ATF before? when you have only one path, the configurations must be the basic... "Basic Failover" and you don't need license for that. It was called CDE when powerpath was ATF. You will use CO policy only when you have 2 or more HBAs.

cheers.
 
As an FYI, it is best practice with EMC to use RAID 3 Raid Groups when using a Backup to Disk solution.
 
Thanks for the feedback.

I'm redesigning my b2d solution into RAID3 groups now. One minor problem I'm unsure as to how to best resolve is that with 15 disks in a shelf, I either have to run without a hotspare or leave out 4 of my ATA disks from the RAID3 metalun (5 disks per RAID3 group), or accept the risk of losing my backup data in the case of multiple disk failure.

Off the top of my head I might say that backup data is not critical and can be lost, but Murphy's law would probably ensure that a failure of my B2D shelf would coincide with a critical server failing and needing fresh restore.

Alternatively I could configure two R3 groups and one R5 with 4 drives, plus one hotspare. I suspect however that I should not be striping metaluns across raid groups of nonuniform raid level, so the last group would not be available for b2d.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top