I am syncing up our Regattas running hacmp with a time server at 3:00 am. At the exact same time we are getting a slew of errors which can be viewed by errpt -a and be seen in the cluster.log file. What the errors all seem to boil down to is that hacmp thinks it is losing track of the heartbeat etc for about 13 to 14 seconds which coincidentally is the exact time the server time gets adjusted by. All these errors happen in the span of about 2 seconds. Has anyone else ran into this problem when syncing up the time? Is this just another odd "feature" of AIX and Hacmp or is there actually something to worry about here? My guess is that hacmp is not smart enough to distinguish between an actual lapse in time between heartbeats and an
artificial lapse caused by a time reset. Any Ideas?
Here are several of the errors logged......
______________________________________________________________________________
Description
Late in sending heartbeat
Probable Causes
Heavy CPU load
Severe physical memory shortage
Heavy I/O activities
Failure Causes
Daemon can not get required system resource
Recommended Actions
Reduce the system load
Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.164,3872
ERROR ID
6zESUw.Sic5y.SQx/62UD.0...................
REFERENCE CODE
A heartbeat is late by the following number of seconds
13
____________________________________________________________________________
____________________________________________________________________________
Resource Name: topsvcs
Description
NIM thread blocked
Probable Causes
A thread in Topology Services NIM process was blocked
Topology Services NIM process cannot get timely access to CPU
User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
artificial lapse caused by a time reset. Any Ideas?
Here are several of the errors logged......
______________________________________________________________________________
Description
Late in sending heartbeat
Probable Causes
Heavy CPU load
Severe physical memory shortage
Heavy I/O activities
Failure Causes
Daemon can not get required system resource
Recommended Actions
Reduce the system load
Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.164,3872
ERROR ID
6zESUw.Sic5y.SQx/62UD.0...................
REFERENCE CODE
A heartbeat is late by the following number of seconds
13
____________________________________________________________________________
____________________________________________________________________________
Resource Name: topsvcs
Description
NIM thread blocked
Probable Causes
A thread in Topology Services NIM process was blocked
Topology Services NIM process cannot get timely access to CPU
User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists