Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

network_down: HACMP

Status
Not open for further replies.

jprabaker

Technical User
May 31, 2001
185
GB
I'm running AIX 4.3.3 ML 08 with HACMP 4.4 Classic. I have 2 cascading resource groups running on 2 nodes. If one node has a "network_down" event, all the other machine does is log this in the hacmp.out file.

Has anyone any experience in customising this event so that it forces a takeover?

Any advice will be apriciated.

JP
 
JP,

Well if a network down event is occuring on one node this is a strange phenomenon they are connected to the same n/w after all. The reason failover is not occuring is because via the heartbeat serial link the server is still up. If you wish to force a failover you need to write a post-event script as a post event of network_down to force a failover. i.e.

#! /bin/ksh
clstop -y -N -gr


My argument is that the network should be down on both nodes at this point anyway. If you are getting a lot of network_down events then consider changing the failure detection rate of tcp/ip to slow.

Hope that helps


PSD
IBM Certified Specialist - AIX V4.3 Systems Support
IBM Certified Specialist - AIX V4 HACMP
 
PSD,

I was thinking about the unlikely event were both NICs in one of my machines would fail. (Our testing chaps will test this by pulling out both the cables!)

I am in the process of writing 2 scripts to handle this, a network_down_complete_post script and a network_up_pre script.

Firstly the network_down_complete_post script checks if the network_down event is global (if [ $3 = "-1" ];then "do nothing....)

If it is not a global failure then it basically check the local_node against the "nodename" passed from the network_down_complete event ($3).

If they are both the same it will touch a tempory file and test a number of times to see if this file still exists. (The network_up_pre script will remove this file!).

If the file gets removed by the network_up_pre script then the script will exit, otherwise after the timeout period it will run a graceful takeover.

Does this make sense or am I on completely the wrong lines here?

Regards,

JP
 
Isn't the point of cluster that it should failover if the active node cannot see the network, but the standby can (ie. network down to active node only) - but that in the event that neither node can see the network no failover takes place?

I'm from a SunCluster background rather than HACMP, but surely the basics of cluster hold true... dont't they?!
 
Nope. I'm using HACMP classic 4.4 and it states in the manuals that in the event of a network_down it will do nothing. Its an event that needs to be tailored to the needs of the particalur site which is why I've done a couple of custom events.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top