Is time syncing causing error messages from Hacmp?

LeeHagen · Jan 10, 2003

I am syncing up our Regattas running hacmp with a time server at 3:00 am. At the exact same time we are getting a slew of errors which can be viewed by errpt -a and be seen in the cluster.log file. What the errors all seem to boil down to is that hacmp thinks it is losing track of the heartbeat etc for about 13 to 14 seconds which coincidentally is the exact time the server time gets adjusted by. All these errors happen in the span of about 2 seconds. Has anyone else ran into this problem when syncing up the time? Is this just another odd "feature" of AIX and Hacmp or is there actually something to worry about here? My guess is that hacmp is not smart enough to distinguish between an actual lapse in time between heartbeats and an
artificial lapse caused by a time reset. Any Ideas?

Here are several of the errors logged......

______________________________________________________________________________
Description
Late in sending heartbeat

Probable Causes
Heavy CPU load
Severe physical memory shortage
Heavy I/O activities

Failure Causes
Daemon can not get required system resource

Recommended Actions
Reduce the system load

Detail Data
DETECTING MODULE
rsct,bootstrp.C,1.164,3872
ERROR ID
6zESUw.Sic5y.SQx/62UD.0...................
REFERENCE CODE

A heartbeat is late by the following number of seconds
13
____________________________________________________________________________
____________________________________________________________________________
Resource Name: topsvcs

Description
NIM thread blocked

Probable Causes
A thread in Topology Services NIM process was blocked
Topology Services NIM process cannot get timely access to CPU

User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention

Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists

Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O

Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists

bi · Jan 10, 2003

Why don't you use xntp to constantly time sync your servers rather than just do it once a day?

sectorseveng · Jan 13, 2003

Agree with bi, but remember to use the -x argument to xntpd to prevent it from jumping instead of drifting if your systems are in advance of the ntp time.

LeeHagen · Jan 13, 2003

Thanks for the responses. I had previously tested xntpd and we did not get any errors. We don't have an internal time server so I didn't want to constantly poll an external time server but we might not have any choice until we get an internal time server built.

Regards,

Lee Hagen

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Is time syncing causing error messages from Hacmp?

LeeHagen

IS-IT--Management

bi

Technical User

sectorseveng

MIS

LeeHagen

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor