Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Heavy network load provoking HACMP failover 1

Status
Not open for further replies.

avalon111

IS-IT--Management
Feb 11, 2010
10
NL
I have a 5.2 LPAR on a POWER5 670 in a HACMP cluster. During periods of heavy network load, network I/O occurs, paging goes up and lrud shows a CPU utilization of 300%. This causes the system to stop responding.We do not get any return codes or detailed logs of the root cause on the machine, which responds by dropping out of the cluster, which reforms with the redundant node.

As far as I can see the adapter and tcp kernel values are absolutely standard, with tcp_sendspace 131072 tcp_recvspace 65536, mtu 1500, 1000 Mbps Full Duplex, auto negotiate, jumbo frames OFF.

I'm normally used to disk i/o issues, so this is an unusual one for me.

Anyone any suggestions as to what path to follow to identify any bottleneck or diagnose an insufficiently-configured parameter. A complication is that the system problems occur out-of-hours.

Thanks in anticipation!

ave
 
Well, one thing you might want to do is setup a cron job that acts like a trigger when the CPU usage starts to go up. At the time CPU starts going up, have it run other commands and send the results via email or page on-call person if there is one to help you pinpoint what might be causing the issue and when it actually is happening.

for example, have a script like the following that runs from cron every 5 minutes or whatever time you need that checks the CPU idle time and if it reaches below a certain point, run some other investigative commands:

Code:
PCTIDLE=`vmstat 1 3 | awk '! /kthr|\-\-\-\-\-|cy/{IDLE += $16} END {print IDLE / 3}'`
if [ "${PCTIDLE}" -le 5 ] ; then
   ps cvg...
   svmon -P...
   sar...
   etc, etc...
fi | mailx -s "CPU idle time problem on `hostname`" someone@somewhere.com

Regards,
Chuck
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top