Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

power fail ?

Status
Not open for further replies.

xiayd

IS-IT--Management
Mar 27, 2006
16
CN
the led on the the power is green, but the console display like this:

Broadcast message from root@sadcora2 (tty) at 15:00:26 ...

rc.powerfail: init has received a SIGPWR signal.
The system is now operating with a power problem.



Broadcast message from root@sadcora2 (tty) at 15:01:26 ...

rc.powerfail: init has received a SIGPWR signal.
The system is now operating with cooling problem.



Broadcast message from root@sadcora2 (tty) at 15:02:26 ...

rc.powerfail: init has received a SIGPWR signal.
The system is now operating with a power problem.

why ?
 
check the errpt. you probably have a power supply problem.
 
Or maybe you've HAD a power supply problem (temporary loss of power on one of your power outlets connected to the two power supplies).

If that was the case, it should also be noted in errpt ("Electrical power resumed" I believe). Then you have to manually edit root's crontab to get rid of the 12 hourly wall message.


HTH,

p5wizard
 
I agree with p5wizard and Ken, we lost our entire 690 on Monday because of a power failure and I had 7 LPARS that I had to get the 12 hour wall out of root's crontab when all of the LPARs came back up and online. Should be the last entry in the crontab if it's not edited on a regular basis.
 
Run advanced diags to sysplanar0 in system verification, if they pass, it was a power failure or a brown out (momentary loss of power that the system registered but did not go down for). Diags will probably say an error was logged but no trouble was found, in which case log a repair, either when you get the option, or using diags, task selection, log a repair action: to sysplanar0. Then edit the wall rc.power from roots cron table.
If you don't log the repair, diag ela will just pick up the errpt entry again at 4am the next morning and you'll get the messages back again.
If diags fail you have a psu out and diags should tell you which one and the location / fru number.
 
console :

Broadcast message from root@sadcora2 (tty) at 15:00:26 ...

rc.powerfail: init has received a SIGPWR signal.
The system is now operating with a power problem.



Broadcast message from root@sadcora2 (tty) at 15:01:26 ...

rc.powerfail: init has received a SIGPWR signal.
The system is now operating with cooling problem.



Broadcast message from root@sadcora2 (tty) at 15:02:26 ...

rc.powerfail: init has received a SIGPWR signal.
The system is now operating with a power problem.


root's crontab:

# (C) COPYRIGHT International Business Machines Corp. 1989,1994
# All Rights Reserved
# Licensed Materials - Property of IBM
#
# US Government Users Restricted Rights - Use, duplication or
# disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
#
#0 3 * * * /usr/sbin/skulker
#45 2 * * 0 /usr/lib/spell/compress
#45 23 * * * ulimit 5000; /usr/lib/smdemon.cleanu > /dev/null
0 11 * * * /usr/bin/errclear -d S,O 30
0 12 * * * /usr/bin/errclear -d H 90
# SSA warning : Deleting the next two lines may cause errors in redundant
# SSA warning : hardware to go undetected.
01 5 * * * /usr/lpp/diagnostics/bin/run_ssa_ela 1>/dev/null 2>/dev/null
0 * * * * /usr/lpp/diagnostics/bin/run_ssa_healthcheck 1>/dev/null 2>/dev/null
# SSA warning : Deleting the next line may allow enclosure hardware errors to go undetected
30 * * * * /usr/lpp/diagnostics/bin/run_ssa_encl_healthcheck 1>/dev/null 2>/dev/null
# SSA warning : Deleting the next line may allow link speed exceptions to go undetected
30 4 * * * /usr/lpp/diagnostics/bin/run_ssa_link_speed 1>/dev/null 2>/dev/null
0 0 * * * /usr/sbin/cluster/utilities/clcycle 1>/dev/null 2>/dev/null # HACMP for AIX Logfile rotation


errpt:

[sadcora2/#]errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
2F3E09A4 0215150506 I H sysplanar0 REPAIR ACTION
071F4755 0215150206 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0215150106 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0215150006 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0215145906 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0215145806 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0215145706 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0215145606 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0203025506 P H sysplanar0 ENVIRONMENTAL PROBLEM
071F4755 0203025406 P H sysplanar0 ENVIRONMENTAL PROBLEM
2F3E09A4 0125095706 I H ssa0 REPAIR ACTION
2F3E09A4 0125095706 I H sysplanar0 REPAIR ACTION
625E6B9A 0117160206 P H ssa0 ADAPTER DETECTED OPEN SERIAL LINK


and i run diag, but no trouble was found!


so strange!

 
If you have a box that can have two power supplies and only one is working, I have seen this type of message, especially when the faulty one was removed for repairs.



BocaBurger
<===========================||////////////////|0
The pen is mightier than the sword, but the sword hurts more!
 
you say you run diag, but if you run diag in problem determination after logging a repair action:
2F3E09A4 0215150506 I H sysplanar0 REPAIR ACTION
diag will not find the problem.
run diag, advanced diags, system verification, sysplanar0 and you will see if you still have a problem.
 
Please post an "errpt -a" output please. Was anything "fixed" before you (or IBM?) logged the repair action?

What type of server is this btw? I seem to remember some types of server can get into all sorts of trouble re power and cooling if the wrong microcode was applied. E.g. 7026-6M1 microcode on a 7026-6H1.

Perhaps you need to get in touch with IBM support.



HTH,

p5wizard
 
the server is 7026-6H1, and replaced a power two or three months ago(a IBMer did it)






 
Well, I'd suggest you open up a new call with IBM support, perhaps there's something wrong with the replacement part you got.


HTH,

p5wizard
 
Once again, for the hard of hearing:
RUN DIAG, ADVANCED DIAGS, SYSTEM VERIFICATION, SYSPLANAR0 AND YOU WILL SEE IF YOU STILL HAVE A PROBLEM.
;-)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top