RMGBELGIUM
MIS
Hi all,
we've go the following problem in our cluster:
the errorreport of node A is filling , making about 5 entries every 4 minutes,giving the following errors :
3C81E43F 0124092205 P U topsvcs Late in sending heartbeat
4FDB3BA1 0124092205 I S topsvcs DeadMan Switch (DMS) close to trigger
864D2CE3 0124092205 P S topsvcs NIM thread blocked
when I do an lssrc -ls topsvcs , this is what I get for tmssa :
NIM's PID: 520390
tmssa_0 [ 2] 2 2 S 255.255.2.0 255.255.2.1
tmssa_0 [ 2] ssa2 0x81f3500c 0x81f35021
HB Interval = 2 secs. Sensitivity = 5 missed beats
Missed HBs: Total: 33 Current group: 18
Packets sent : 4185480 ICMP 0 Errors: 5060 No mbuf: 0
Packets received: 4120068 ICMP 0 Dropped: 0
the number of errors strangely stays the same, hasn't changed in hours, even though the errorreport keeps filling with errors.On node B from the cluster, I'm getting the following :
NIM's PID: 340218
tmssa_0 [ 2] 2 2 S 255.255.2.1 255.255.2.1
tmssa_0 [ 2] ssa1 0x81f3501b 0x81f35021
HB Interval = 2 secs. Sensitivity = 5 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 66959 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 69394 ICMP 0 Dropped: 0
Here there are no errors.We haven't got any open links, and the load on the system isn't any different to any other day.
The machines are running HACMP 5.1 on an AIX5.2
Perhaps I must mention the following :
Yesterday node B was moved to node A for maintenance, and afterwards reacquired, but this went without any problems.It is since that moment that the entries started to come on node A.
Any suggestions for this are welcome
thx in advance,
greetz,
RMGBelgium
we've go the following problem in our cluster:
the errorreport of node A is filling , making about 5 entries every 4 minutes,giving the following errors :
3C81E43F 0124092205 P U topsvcs Late in sending heartbeat
4FDB3BA1 0124092205 I S topsvcs DeadMan Switch (DMS) close to trigger
864D2CE3 0124092205 P S topsvcs NIM thread blocked
when I do an lssrc -ls topsvcs , this is what I get for tmssa :
NIM's PID: 520390
tmssa_0 [ 2] 2 2 S 255.255.2.0 255.255.2.1
tmssa_0 [ 2] ssa2 0x81f3500c 0x81f35021
HB Interval = 2 secs. Sensitivity = 5 missed beats
Missed HBs: Total: 33 Current group: 18
Packets sent : 4185480 ICMP 0 Errors: 5060 No mbuf: 0
Packets received: 4120068 ICMP 0 Dropped: 0
the number of errors strangely stays the same, hasn't changed in hours, even though the errorreport keeps filling with errors.On node B from the cluster, I'm getting the following :
NIM's PID: 340218
tmssa_0 [ 2] 2 2 S 255.255.2.1 255.255.2.1
tmssa_0 [ 2] ssa1 0x81f3501b 0x81f35021
HB Interval = 2 secs. Sensitivity = 5 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 66959 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 69394 ICMP 0 Dropped: 0
Here there are no errors.We haven't got any open links, and the load on the system isn't any different to any other day.
The machines are running HACMP 5.1 on an AIX5.2
Perhaps I must mention the following :
Yesterday node B was moved to node A for maintenance, and afterwards reacquired, but this went without any problems.It is since that moment that the entries started to come on node A.
Any suggestions for this are welcome
thx in advance,
greetz,
RMGBelgium