I have 2 CL1850R server configured as a cluster, about a month ago we decided to test the fail over by pulling the plug on Node1 while monitoring the fail over on Node2 the fail over failed. When reviewing the log files, cluster. log and the event viewer logs what seems to be happening is that when node1 is taken down and the failover starts process seems to be working but what happens is that Node2 reports that it has lost communication with Node1’s Heartbeat and the public connection. At that point I would expect the cluster service would stop and restart and then take over the cluster group. Instead what happens is that when the service stops it will not restart. And I get this in the event viewer “Microsoft Clustering service was halted due to a cluster membership or communications error.” At that point the cluster service on Node2 tries to restart and fails. It is not until node1 is back online that the cluster service on node2 will start. The following is what I have tried.
1Updated all hardware firmware Bios’s and Drivers
2 tried the failover again no luck
3 Evicted node2 un-installed CL services from Node 2 shut it down and rebooted
(Downed node1 also)
4. Re-installed CL service and re-joined Node 2 to the cluster group.
5. tried to failover again and again was not successful
I have checked everything that I can possibly think of and cannot figure this out. I have looked at many MS documents regarding MSCS and have not found anything that would be of help.
Hardware that is in play is 1 cluster box with 2 CL1850R servers each have two drive in a mirror. Also in the box is the raid array with is the Cluster Resources this is a cr3500 controller with 6 drives in a hot swap raid5.
1Updated all hardware firmware Bios’s and Drivers
2 tried the failover again no luck
3 Evicted node2 un-installed CL services from Node 2 shut it down and rebooted
(Downed node1 also)
4. Re-installed CL service and re-joined Node 2 to the cluster group.
5. tried to failover again and again was not successful
I have checked everything that I can possibly think of and cannot figure this out. I have looked at many MS documents regarding MSCS and have not found anything that would be of help.
Hardware that is in play is 1 cluster box with 2 CL1850R servers each have two drive in a mirror. Also in the box is the raid array with is the Cluster Resources this is a cr3500 controller with 6 drives in a hot swap raid5.