Hello
After an upgrade from HACMP5.4 to 5.5 I did some cluster acceptation tests.
Doing so I ran into a problem.
One test is to "halt -q" one node (of a two node cluster)
Normally the second node should take over the resources. Instead of doing that it crashed with 888 error code.
In the errorlog I find this:
F48137AC 0619073010 U O minidump COMPRESSED MINIMAL DUMP
225E3B63 0619073010 T S PANIC SOFTWARE PROGRAM ABNORMALLY TERMINATED
9DBCFDEE 0619073110 T O errdemon ERROR LOGGING TURNED ON
AB59ABFF 0619071010 U U LIBLVM Remote node Concurrent Volume Group fail
90EDB0A5 0619071010 P S topsvcs Dead Man Switch being allowed to expire.
173C787F 0619071010 I S topsvcs Possible malfunction on local adapter
So, it is because of the dead man switch timer the node went down.
Trying to solve this, I see some recommendations about setting the syncd frequency to 10 instead of 60.
Seems that the upgrade resetted the frequency back to 60.
I can't test this out but can anyone explain to me how this works. Why can the syncd frequency give problems when I halt another node. What is the link between those?
best regards
Steven
After an upgrade from HACMP5.4 to 5.5 I did some cluster acceptation tests.
Doing so I ran into a problem.
One test is to "halt -q" one node (of a two node cluster)
Normally the second node should take over the resources. Instead of doing that it crashed with 888 error code.
In the errorlog I find this:
F48137AC 0619073010 U O minidump COMPRESSED MINIMAL DUMP
225E3B63 0619073010 T S PANIC SOFTWARE PROGRAM ABNORMALLY TERMINATED
9DBCFDEE 0619073110 T O errdemon ERROR LOGGING TURNED ON
AB59ABFF 0619071010 U U LIBLVM Remote node Concurrent Volume Group fail
90EDB0A5 0619071010 P S topsvcs Dead Man Switch being allowed to expire.
173C787F 0619071010 I S topsvcs Possible malfunction on local adapter
So, it is because of the dead man switch timer the node went down.
Trying to solve this, I see some recommendations about setting the syncd frequency to 10 instead of 60.
Seems that the upgrade resetted the frequency back to 60.
I can't test this out but can anyone explain to me how this works. Why can the syncd frequency give problems when I halt another node. What is the link between those?
best regards
Steven