mastermind455
ISP
Hi all
i installed a heartbeat based Linux Cluster (a RH 7.2 Box and a Gentoo Box)
so far so good... the cluster works, but if one node is fail, the healthy node won't failover, which means the websites aren't available
the logs are looking ok, but where is the problem?
(see log below from the secondary node):
-----------------------------------------------------------
heartbeat: 2005/01/15_21:08:17 info: remote resource transition completed.
heartbeat: 2005/01/15_21:08:17 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:08:17 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
heartbeat: 2005/01/15_21:08:17 received ip-request-resp 10.0.0.3 OK yes
heartbeat: 2005/01/15_21:08:17 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:08:17 info: Running /etc/ha.d/resource.d/IPaddr 10.0.0.3 start
heartbeat: 2005/01/15_21:08:18 info: /sbin/ifconfig eth1:0 10.0.0.3 netmask 255.255.255.0 broadcast 10.0.0.255
heartbeat: 2005/01/15_21:08:18 info: Sending Gratuitous Arp for 10.0.0.3 on eth1:0 [eth1]
heartbeat: 2005/01/15_21:08:18 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.0.0.3 eth1 10.0.0.3 auto 10.0.0.3 ffffffffffff
heartbeat: 2005/01/15_21:08:18 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:09:06 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:09:06 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:09:06 info: acquire local HA resources (standby).
heartbeat: 2005/01/15_21:09:06 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:09:07 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:09:07 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:09:07 info: local HA resource acquisition completed (standby).
heartbeat: 2005/01/15_21:09:07 info: Standby resource acquisition done [all].
heartbeat: 2005/01/15_21:09:07 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:09:07 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat: 2005/01/15_21:09:07 info: mach_down takeover complete.
heartbeat: 2005/01/15_21:09:07 info: mach_down takeover complete for node websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:09:37 WARN: node websrv01.elvisaltherr.ch: is dead
heartbeat: 2005/01/15_21:09:37 info: Dead node websrv01.elvisaltherr.ch gave up resources.
heartbeat: 2005/01/15_21:09:37 info: Link websrv01.elvisaltherr.ch:eth1 dead.
heartbeat: 2005/01/15_21:09:37 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 dead.
heartbeat: 2005/01/15_21:09:53 info: Heartbeat restart on node websrv01.elvisaltherr.ch
heartbeat: 2005/01/15_21:09:53 info: Link websrv01.elvisaltherr.ch:eth1 up.
heartbeat: 2005/01/15_21:09:53 info: Status update for node websrv01.elvisaltherr.ch: status up
heartbeat: 2005/01/15_21:09:53 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:09:53 info: Status update for node websrv01.elvisaltherr.ch: status active
heartbeat: 2005/01/15_21:09:53 info: remote resource transition completed.
heartbeat: 2005/01/15_21:09:53 info: anita.elvisaltherr.ch wants to go standby [foreign]
heartbeat: 2005/01/15_21:09:53 info: standby: websrv01.elvisaltherr.ch can take our foreign resources
heartbeat: 2005/01/15_21:09:53 info: give up foreign HA resources (standby).
heartbeat: 2005/01/15_21:09:53 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:09:54 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 up.
heartbeat: 2005/01/15_21:10:09 info: foreign HA resource release completed (standby).
heartbeat: 2005/01/15_21:10:09 info: Local standby process completed [foreign].
heartbeat: 2005/01/15_21:10:10 WARN: 1 lost packet(s) for [websrv01.elvisaltherr.ch] [15:17]
heartbeat: 2005/01/15_21:10:10 info: remote resource transition completed.
heartbeat: 2005/01/15_21:10:10 info: No pkts missing from websrv01.elvisaltherr.ch!
heartbeat: 2005/01/15_21:10:10 info: Other node completed standby takeover of foreign resources.
heartbeat: 2005/01/15_21:11:35 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:11:35 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:11:35 info: acquire local HA resources (standby).
heartbeat: 2005/01/15_21:11:35 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:11:35 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:11:35 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:11:36 info: local HA resource acquisition completed (standby).
heartbeat: 2005/01/15_21:11:36 info: Standby resource acquisition done [foreign].
heartbeat: 2005/01/15_21:11:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:11:36 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete.
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete for node websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:12:06 WARN: node websrv01.elvisaltherr.ch: is dead
heartbeat: 2005/01/15_21:12:06 info: Dead node websrv01.elvisaltherr.ch gave up resources.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:eth1 dead.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 dead.
heartbeat: 2005/01/15_21:12:31 WARN: TTY write timeout on [/dev/ttyS1] (no connection or bad cable? [see documentation])
heartbeat: 2005/01/15_21:10:10 info: No pkts missing from websrv01.elvisaltherr.ch!
heartbeat: 2005/01/15_21:10:10 info: Other node completed standby takeover of foreign resources.
heartbeat: 2005/01/15_21:11:35 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:11:35 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:11:35 info: acquire local HA resources (standby).
heartbeat: 2005/01/15_21:11:35 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:11:35 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:11:35 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:11:36 info: local HA resource acquisition completed (standby).
heartbeat: 2005/01/15_21:11:36 info: Standby resource acquisition done [foreign].
heartbeat: 2005/01/15_21:11:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:11:36 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete.
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete for node websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:12:06 WARN: node websrv01.elvisaltherr.ch: is dead
heartbeat: 2005/01/15_21:12:06 info: Dead node websrv01.elvisaltherr.ch gave up resources.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:eth1 dead.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 dead.
heartbeat: 2005/01/15_21:12:31 WARN: TTY write timeout on [/dev/ttyS1] (no connection or bad cable? [see documentation])
heartbeat: 2005/01/15_21:10:10 info: No pkts missing from websrv01.elvisaltherr.ch!
heartbeat: 2005/01/15_21:10:10 info: Other node completed standby takeover of foreign resources.
heartbeat: 2005/01/15_21:11:35 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:11:35 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:11:35 info: acquire local HA resources (standby).
------------------------------------------------------------
many thanks for your help
i installed a heartbeat based Linux Cluster (a RH 7.2 Box and a Gentoo Box)
so far so good... the cluster works, but if one node is fail, the healthy node won't failover, which means the websites aren't available
the logs are looking ok, but where is the problem?
(see log below from the secondary node):
-----------------------------------------------------------
heartbeat: 2005/01/15_21:08:17 info: remote resource transition completed.
heartbeat: 2005/01/15_21:08:17 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:08:17 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
heartbeat: 2005/01/15_21:08:17 received ip-request-resp 10.0.0.3 OK yes
heartbeat: 2005/01/15_21:08:17 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:08:17 info: Running /etc/ha.d/resource.d/IPaddr 10.0.0.3 start
heartbeat: 2005/01/15_21:08:18 info: /sbin/ifconfig eth1:0 10.0.0.3 netmask 255.255.255.0 broadcast 10.0.0.255
heartbeat: 2005/01/15_21:08:18 info: Sending Gratuitous Arp for 10.0.0.3 on eth1:0 [eth1]
heartbeat: 2005/01/15_21:08:18 /usr/lib/heartbeat/send_arp -i 1010 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-10.0.0.3 eth1 10.0.0.3 auto 10.0.0.3 ffffffffffff
heartbeat: 2005/01/15_21:08:18 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:09:06 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:09:06 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:09:06 info: acquire local HA resources (standby).
heartbeat: 2005/01/15_21:09:06 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:09:07 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:09:07 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:09:07 info: local HA resource acquisition completed (standby).
heartbeat: 2005/01/15_21:09:07 info: Standby resource acquisition done [all].
heartbeat: 2005/01/15_21:09:07 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:09:07 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat: 2005/01/15_21:09:07 info: mach_down takeover complete.
heartbeat: 2005/01/15_21:09:07 info: mach_down takeover complete for node websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:09:37 WARN: node websrv01.elvisaltherr.ch: is dead
heartbeat: 2005/01/15_21:09:37 info: Dead node websrv01.elvisaltherr.ch gave up resources.
heartbeat: 2005/01/15_21:09:37 info: Link websrv01.elvisaltherr.ch:eth1 dead.
heartbeat: 2005/01/15_21:09:37 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 dead.
heartbeat: 2005/01/15_21:09:53 info: Heartbeat restart on node websrv01.elvisaltherr.ch
heartbeat: 2005/01/15_21:09:53 info: Link websrv01.elvisaltherr.ch:eth1 up.
heartbeat: 2005/01/15_21:09:53 info: Status update for node websrv01.elvisaltherr.ch: status up
heartbeat: 2005/01/15_21:09:53 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:09:53 info: Status update for node websrv01.elvisaltherr.ch: status active
heartbeat: 2005/01/15_21:09:53 info: remote resource transition completed.
heartbeat: 2005/01/15_21:09:53 info: anita.elvisaltherr.ch wants to go standby [foreign]
heartbeat: 2005/01/15_21:09:53 info: standby: websrv01.elvisaltherr.ch can take our foreign resources
heartbeat: 2005/01/15_21:09:53 info: give up foreign HA resources (standby).
heartbeat: 2005/01/15_21:09:53 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:09:54 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 up.
heartbeat: 2005/01/15_21:10:09 info: foreign HA resource release completed (standby).
heartbeat: 2005/01/15_21:10:09 info: Local standby process completed [foreign].
heartbeat: 2005/01/15_21:10:10 WARN: 1 lost packet(s) for [websrv01.elvisaltherr.ch] [15:17]
heartbeat: 2005/01/15_21:10:10 info: remote resource transition completed.
heartbeat: 2005/01/15_21:10:10 info: No pkts missing from websrv01.elvisaltherr.ch!
heartbeat: 2005/01/15_21:10:10 info: Other node completed standby takeover of foreign resources.
heartbeat: 2005/01/15_21:11:35 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:11:35 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:11:35 info: acquire local HA resources (standby).
heartbeat: 2005/01/15_21:11:35 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:11:35 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:11:35 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:11:36 info: local HA resource acquisition completed (standby).
heartbeat: 2005/01/15_21:11:36 info: Standby resource acquisition done [foreign].
heartbeat: 2005/01/15_21:11:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:11:36 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete.
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete for node websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:12:06 WARN: node websrv01.elvisaltherr.ch: is dead
heartbeat: 2005/01/15_21:12:06 info: Dead node websrv01.elvisaltherr.ch gave up resources.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:eth1 dead.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 dead.
heartbeat: 2005/01/15_21:12:31 WARN: TTY write timeout on [/dev/ttyS1] (no connection or bad cable? [see documentation])
heartbeat: 2005/01/15_21:10:10 info: No pkts missing from websrv01.elvisaltherr.ch!
heartbeat: 2005/01/15_21:10:10 info: Other node completed standby takeover of foreign resources.
heartbeat: 2005/01/15_21:11:35 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:11:35 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:11:35 info: acquire local HA resources (standby).
heartbeat: 2005/01/15_21:11:35 info: Acquiring resource group: anita.elvisaltherr.ch 10.0.0.3 apache
heartbeat: 2005/01/15_21:11:35 info: Local Resource acquisition completed.
heartbeat: 2005/01/15_21:11:35 info: Running /etc/ha.d/resource.d/apache start
heartbeat: 2005/01/15_21:11:36 info: local HA resource acquisition completed (standby).
heartbeat: 2005/01/15_21:11:36 info: Standby resource acquisition done [foreign].
heartbeat: 2005/01/15_21:11:36 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2005/01/15_21:11:36 info: /usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete.
heartbeat: 2005/01/15_21:11:36 info: mach_down takeover complete for node websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:12:06 WARN: node websrv01.elvisaltherr.ch: is dead
heartbeat: 2005/01/15_21:12:06 info: Dead node websrv01.elvisaltherr.ch gave up resources.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:eth1 dead.
heartbeat: 2005/01/15_21:12:06 info: Link websrv01.elvisaltherr.ch:/dev/ttyS1 dead.
heartbeat: 2005/01/15_21:12:31 WARN: TTY write timeout on [/dev/ttyS1] (no connection or bad cable? [see documentation])
heartbeat: 2005/01/15_21:10:10 info: No pkts missing from websrv01.elvisaltherr.ch!
heartbeat: 2005/01/15_21:10:10 info: Other node completed standby takeover of foreign resources.
heartbeat: 2005/01/15_21:11:35 info: Received shutdown notice from 'websrv01.elvisaltherr.ch'.
heartbeat: 2005/01/15_21:11:35 info: Resources being acquired from websrv01.elvisaltherr.ch.
heartbeat: 2005/01/15_21:11:35 info: acquire local HA resources (standby).
------------------------------------------------------------
many thanks for your help