Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cluster failover does not fail over to my node2

Status
Not open for further replies.

clbw

IS-IT--Management
Sep 26, 2005
11
US
I have 2 CL1850R server configured as a cluster, about a month ago we decided to test the fail over by pulling the plug on Node1 while monitoring the fail over on Node2 the fail over failed. When reviewing the log files, cluster. log and the event viewer logs what seems to be happening is that when node1 is taken down and the failover starts process seems to be working but what happens is that Node2 reports that it has lost communication with Node1’s Heartbeat and the public connection. At that point I would expect the cluster service would stop and restart and then take over the cluster group. Instead what happens is that when the service stops it will not restart. And I get this in the event viewer “Microsoft Clustering service was halted due to a cluster membership or communications error.” At that point the cluster service on Node2 tries to restart and fails. It is not until node1 is back online that the cluster service on node2 will start. The following is what I have tried.

1Updated all hardware firmware Bios’s and Drivers
2 tried the failover again no luck
3 Evicted node2 un-installed CL services from Node 2 shut it down and rebooted
(Downed node1 also)
4. Re-installed CL service and re-joined Node 2 to the cluster group.
5. tried to failover again and again was not successful

I have checked everything that I can possibly think of and cannot figure this out. I have looked at many MS documents regarding MSCS and have not found anything that would be of help.


Hardware that is in play is 1 cluster box with 2 CL1850R servers each have two drive in a mirror. Also in the box is the raid array with is the Cluster Resources this is a cr3500 controller with 6 drives in a hot swap raid5.
 
Are you SCSI or Fiber attached to the shared disks? If fiber attached, double check and confirm your settings on your HBAs. Another question - are you able to fail-over (move the group) while node1 is online? If so there may be contention in the SCSI bus, if using SCSI. Had this problem with older IBMServRAID cards.
 
The cluster is scsi attached with both server connected to the cr3500 controller. the servers are connected via an internal "heartbeat" ethernet connection, and an extrenal ethernet connection as well. I cannot failover with Node1 on-line. I am checking the cr3500 controller tonight and I am guessing that my issue issue is with the controller.
 
Good luck let us know what you find. Also check node2 via disk adminstrator and see if it shows the drive as unkown when it does not own the resource.
 
Indeed it is with the controller in the NODE2 server, it does not seem to see the cluster becouse of a configuration issue. Once this is fixed all will be well....
 
Totally off topic:

WhoKilledKenny. There used to be a wav file of that southpark quote. I used to randomly embed in email by enconding it as a body part and using a bgsound tag to fire it off.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top