Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Diagnostic Group Shutdown Partition? 1

Status
Not open for further replies.

slgordon

IS-IT--Management
Sep 27, 2000
57
US
Can anyone give me any insight on a DGSP message?
I had a real problem last night where I was trying to bring up one of my nodes on our HACMP cluster and my other two nodes came down hard while clstrmgr was starting. The last message in my /var/adm/cluster.log is "hacmp clstrmgr[13458]: Failover: sending DGSP to NODE1" when the nodes came down!
Any info would be appreciated.
Says the Manager to the person that invented the modem: "Tell me again why you need two prototypes?!?"
 
Slgordon,

What you had was a partitioned cluster. What happens is that the nodes lose all heartbeat links i.e. IP and non-IP (if configured) between them and each node or in this case the 2 in one partition and 1 in the other partition could not see each other. In this situation you can get failovers occuring when the boxes are in effect still up and results are un-predictable.

When the problem is resolved i.e. the link is re-established a DGSP message, which means that the cluster is ready to start the heartbeating again, is sent to the cluster members and depending on certain conditions, certain nodes are taken down immediately, as in the situation you had last night.

Why does this happen? - well it is in effect when all hearbeating links between the nodes fail i.e. TCP/IP and the serial network if you have one. This really only occurs if you do not have a serial link i.e. RS232 or TM-SSA because if the TCP/IP network fails the node status is confirmed via the heartbeat through the non-tcpip network.

If you do not have a serial link make sure you configure one, and if you do have a serial link look in the /var/adm/cluster.log file for any serial link failures.

Hope that helps

PSD
HACMP Specialist
 
Thanks PSD! That makes sense since I have been having problems with the tty that the serial link is connected to! Says the Manager to the person that invented the modem: "Tell me again why you need two prototypes?!?"
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top