Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Disk Failover Fails

Status
Not open for further replies.

Dinkytoy

IS-IT--Management
Jun 14, 2007
147
GB
Hi,

Wondering if anyone can shed some further light on some issues occurring on my Win 2k3 sp2 cluster. We had a random failure on our active node over last weekend. The second node saw this and tried to take control, however, the following problem occurred (from the cluster.log).


WARN Physical Disk <Disk Q:>: [DiskArb] Retry arbitration, 2 attempts left
INFO Physical Disk <Disk Q:>: [DiskArb] Read the partition info to insure the disk is accessible.
INFO Physical Disk <Disk Q:>: [DiskArb] Issuing GetPartInfo on signature 2b6e9530.
ERR Physical Disk <Disk Q:>: [DiskArb] GetPartInfo completed, status 1168.
INFO Physical Disk <Disk Q:>: [DiskArb] Arbitrate for ownership of the disk by reading/writing various disk sectors.
ERR Physical Disk <Disk Q:>: [DiskArb] Failed to read (sector 12), error 1168.
INFO Physical Disk <Disk Q:>: [DiskArb] We are about to break reserve.
INFO Physical Disk <Disk Q:>: [DiskArb] Issuing BusReset on signature 2b6e9530.
ERR Physical Disk <Disk Q:>: [DiskArb] BusReset completed, status 1168.
ERR Physical Disk <Disk Q:>: [DiskArb] Failed to break reservation, error 1168.

Fortunately the active node came back up itself removing the need to failover but this indicates the the cluster is not healthy and won't failover should a larger problem occur.

Having read around a lot, some kbs point at windows firewall, this is disabled and so irrelevant. I also found a iscsi initiator guide from MS, this points to SCSI Persistent Reserve and Persistant Release. However, on investigation the referred to service doesn't seem to exist within the cluster. I'm fairly sure the issue is to do with the release or non-release of the disks but I'm not sure where to go with this further.

I would appreciate 2nd, 3rd, 4th opinions on this if anyone has any insight. Hopefully I can do some tests this weekend.

Thanks.
 
Most often this caused by a compatibility problem between storage and the drivers. What storage are you using and which drivers.

Tony ... aka chgwhat

When in doubt,,, Power out...
 
MD3000i it's iSCSI so iSCSI initiator is running over 1 intel and 1 broadcom nic. Broadcom driver is 4.4.15.0 and the intel driver is 9.12.36.0.

There are likely to be newer drivers to be honest, but given it's been running for some time I'm not convinced, at the moment, that older drivers are likely to be a cause.
 
Ignore now. A further investigation and failover test was successful without a problem. We suspect SQL was doing something heavy at the time and kept writing to the disks or something similar at the time the failover happened.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top