Hi all,
I read somewhere that hacmp shuts down a node to prevent data corruption when it detects a change in IP address in a cluster. I've been having problems where my nodes shutdown by themselves and errpt shows that clexit.rc has been called. The reason why I ask how HACMP detects change is IP is to see if there is a work around.
I have two clusters. Cluster "A" and "B", each cluster has two nodes, "1" and "2".
I've noticed the following behaviour:
With cluster A is stable, with node 1 and 2 up.
I start cluster B node 2, I noticed that Cluster A node 1 drops dead, powers off.
With cluster B stable, with node 1 and 2 up.
I start cluster A node 1, I noticed that Cluster B node 2 drops dead, powers off.
I've checked all MAC address, they are all unique. All boot and services IP's are unique. However, the boot/service IP on both clusters are on the same network (subnet).
Cluster A can run fine without cluster B, and vice versa.
Very weird, my theory is that somehow the stable node detects a change in IP on the other cluster so it decides to drop dead. Does it detect because they are on the same network even though they have different IP?
Would the version I'm running cause this behaviour? I'm running:
cluster.base.server.rte 4.3.1.16
The reason I suspect this is because I have a third cluster with boot/service IP on the same network, but running on cluster.base.server.rte 4.3.1.15. That cluster seems to be fine.
Is there anything I can do besides pulling out a sniffer or turning HACMP debug on to further investigate?
thanks in advance!
Ukyo
I read somewhere that hacmp shuts down a node to prevent data corruption when it detects a change in IP address in a cluster. I've been having problems where my nodes shutdown by themselves and errpt shows that clexit.rc has been called. The reason why I ask how HACMP detects change is IP is to see if there is a work around.
I have two clusters. Cluster "A" and "B", each cluster has two nodes, "1" and "2".
I've noticed the following behaviour:
With cluster A is stable, with node 1 and 2 up.
I start cluster B node 2, I noticed that Cluster A node 1 drops dead, powers off.
With cluster B stable, with node 1 and 2 up.
I start cluster A node 1, I noticed that Cluster B node 2 drops dead, powers off.
I've checked all MAC address, they are all unique. All boot and services IP's are unique. However, the boot/service IP on both clusters are on the same network (subnet).
Cluster A can run fine without cluster B, and vice versa.
Very weird, my theory is that somehow the stable node detects a change in IP on the other cluster so it decides to drop dead. Does it detect because they are on the same network even though they have different IP?
Would the version I'm running cause this behaviour? I'm running:
cluster.base.server.rte 4.3.1.16
The reason I suspect this is because I have a third cluster with boot/service IP on the same network, but running on cluster.base.server.rte 4.3.1.15. That cluster seems to be fine.
Is there anything I can do besides pulling out a sniffer or turning HACMP debug on to further investigate?
thanks in advance!
Ukyo