Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

disk heartbeat device in hacmp 5.1???

Status
Not open for further replies.

rs6000er

Technical User
Jul 21, 2004
74
CA
hi all

my HA clsuter included tow p5 550 servers. i have created a logical
disk in fastT devices for cucurrent vg as the the heartbeate device.


the cluster is working very well. but i got some error report about the
disk heartbeat device.


LABEL: TS_NIM_ERROR_STUCK_
IDENTIFIER: 864D2CE3


Date/Time: Thu Feb 9 14:20:12 EST
Sequence Number: 1114
Machine Id:
Node Id:
Class: S
Type: PERM
Resource Name: topsvcs


Description
NIM thread blocked


Probable Causes
A thread in Topology Services NIM process was blocked
Topology Services NIM process cannot get timely access to CPU


User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention


Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists


Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O


Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists


Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39.1.2,5492
ERROR ID
6XnGH40gLtu1/wbs.6Sq1g0...................
REFERENCE CODE


Thread which was blocked
receive thread
Interval in seconds during which process was blocked
6
Interface name
rhdisk4


i have checked the sys0 and status of topsvcs. High/Low watermark were
set up as you metion. i also noticed that the "Missed HBs" in output of
lssrc -al topsvcs is higer. please check the contentx pasted as
following.


Subsystem Group PID Status
topsvcs topsvcs 186096 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
net_ether_01_0 [ 0] 2 2 S 172.15.103.136 172.15.103.136
net_ether_01_0 [ 0] en6 0x53e78619 0x53ea5da7
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 197856 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 318393 ICMP 0 Dropped: 0
NIM's PID: 130374
net_ether_01_1 [ 1] 2 2 S 172.16.60.68 172.16.60.68
net_ether_01_1 [ 1] en7 0x53e7861a 0x53ea5d39
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 218143 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 308892 ICMP 0 Dropped: 0
NIM's PID: 176536
diskhb_0 [ 2] 2 2 S 255.255.10.1 255.255.10.1
diskhb_0 [ 2] rhdisk4 0x83e78618 0x83ea5d40
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 121 Current group: 65 <------------- missing
transactions
Packets sent : 85235 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 85268 ICMP 0 Dropped: 0
NIM's PID: 171742
2 locally connected Clients with PIDs:
haemd(174312) hagsd(173030)
Dead Man Switch Enabled:
reset interval = 1 seconds
trip interval = 20 seconds
Configuration Instance = 135
Daemon employs no security
Segments pinned: Text Data.
Text segment size: 768 KB. Static data segment size: 981 KB.
Dynamic data segment size: 5965. Number of outstanding malloc: -296
User time 24 sec. System time 40 sec.
Number of page faults: 149. Process swapped out 0 times.
Number of nodes up: 2. Number of nodes down: 0.


SW_dist_intr false Enable SW distribution of interrupts
True
autorestart true Automatically REBOOT system after a
crash True
boottype disk N/A
False
conslogin enable System Console Login
False
cpuguard enable CPU Guard
True
frequency 528000000 System Bus Frequency
False
fullcore false Enable full CORE dump
True
fwversion IBM,SF235_180 Firmware version and revision levels
False
id_to_partition 0X8000086D49600001 Partition ID
False
id_to_system 0X8000086D49600000 System ID
False
iostat false Continuously maintain DISK I/O
history True
keylock normal State of system keylock at boot time
False
maxbuf 20 Maximum number of pages in block I/O
BUFFER CACHE True
maxmbuf 0 Maximum Kbytes of real memory
allowed for MBUFS True
maxpout 33 HIGH water mark for pending write
I/Os per file True
maxuproc 2000 Maximum number of PROCESSES allowed
per user True
minpout 24 LOW water mark for pending write
I/Os per file True
modelname IBM,9117-570 Machine name
False
ncargs 64 ARG/ENV list size in 4K byte blocks
True
pre430core false Use pre-430 style CORE dump
True
pre520tune disable Pre-520 tuning compatibility mode
True
realmem 16187392 Amount of usable physical memory in
Kbytes False
rtasversion 1 Open Firmware RTAS version
False
systemid IBM,021059E3A Hardware system identifier
False


does anybody know what it is and how to fix that?


any suggest will be really appreciated!


thanks in advance,


Frank


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top