Hi,
Wondering if anyone can shed some further light on some issues occurring on my Win 2k3 sp2 cluster. We had a random failure on our active node over last weekend. The second node saw this and tried to take control, however, the following problem occurred (from the cluster.log).
WARN Physical Disk <Disk Q:>: [DiskArb] Retry arbitration, 2 attempts left
INFO Physical Disk <Disk Q:>: [DiskArb] Read the partition info to insure the disk is accessible.
INFO Physical Disk <Disk Q:>: [DiskArb] Issuing GetPartInfo on signature 2b6e9530.
ERR Physical Disk <Disk Q:>: [DiskArb] GetPartInfo completed, status 1168.
INFO Physical Disk <Disk Q:>: [DiskArb] Arbitrate for ownership of the disk by reading/writing various disk sectors.
ERR Physical Disk <Disk Q:>: [DiskArb] Failed to read (sector 12), error 1168.
INFO Physical Disk <Disk Q:>: [DiskArb] We are about to break reserve.
INFO Physical Disk <Disk Q:>: [DiskArb] Issuing BusReset on signature 2b6e9530.
ERR Physical Disk <Disk Q:>: [DiskArb] BusReset completed, status 1168.
ERR Physical Disk <Disk Q:>: [DiskArb] Failed to break reservation, error 1168.
Fortunately the active node came back up itself removing the need to failover but this indicates the the cluster is not healthy and won't failover should a larger problem occur.
Having read around a lot, some kbs point at windows firewall, this is disabled and so irrelevant. I also found a iscsi initiator guide from MS, this points to SCSI Persistent Reserve and Persistant Release. However, on investigation the referred to service doesn't seem to exist within the cluster. I'm fairly sure the issue is to do with the release or non-release of the disks but I'm not sure where to go with this further.
I would appreciate 2nd, 3rd, 4th opinions on this if anyone has any insight. Hopefully I can do some tests this weekend.
Thanks.
Wondering if anyone can shed some further light on some issues occurring on my Win 2k3 sp2 cluster. We had a random failure on our active node over last weekend. The second node saw this and tried to take control, however, the following problem occurred (from the cluster.log).
WARN Physical Disk <Disk Q:>: [DiskArb] Retry arbitration, 2 attempts left
INFO Physical Disk <Disk Q:>: [DiskArb] Read the partition info to insure the disk is accessible.
INFO Physical Disk <Disk Q:>: [DiskArb] Issuing GetPartInfo on signature 2b6e9530.
ERR Physical Disk <Disk Q:>: [DiskArb] GetPartInfo completed, status 1168.
INFO Physical Disk <Disk Q:>: [DiskArb] Arbitrate for ownership of the disk by reading/writing various disk sectors.
ERR Physical Disk <Disk Q:>: [DiskArb] Failed to read (sector 12), error 1168.
INFO Physical Disk <Disk Q:>: [DiskArb] We are about to break reserve.
INFO Physical Disk <Disk Q:>: [DiskArb] Issuing BusReset on signature 2b6e9530.
ERR Physical Disk <Disk Q:>: [DiskArb] BusReset completed, status 1168.
ERR Physical Disk <Disk Q:>: [DiskArb] Failed to break reservation, error 1168.
Fortunately the active node came back up itself removing the need to failover but this indicates the the cluster is not healthy and won't failover should a larger problem occur.
Having read around a lot, some kbs point at windows firewall, this is disabled and so irrelevant. I also found a iscsi initiator guide from MS, this points to SCSI Persistent Reserve and Persistant Release. However, on investigation the referred to service doesn't seem to exist within the cluster. I'm fairly sure the issue is to do with the release or non-release of the disks but I'm not sure where to go with this further.
I would appreciate 2nd, 3rd, 4th opinions on this if anyone has any insight. Hopefully I can do some tests this weekend.
Thanks.