Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Solaris 8 / mult platforms / gen fault reporting

Status
Not open for further replies.

mntlfngrs

Technical User
Apr 13, 2002
36
0
0
US
So I am not part of our companies Unix teem but I am trying to help them make their servers "raise their hands" so to speak when a wrench light comes on. Right now we have to physicaly look at the servers for wrench lights and I think that is rediculous. I was playing with prtdiag and talked to the unix guys but for some reason they say it will not work on all the platforms. We have several different rack mont and enterprise servers. 280r, 480r, ent. 4500, ent. 6500, v100, v210, v240, v1280, s1. They use Big Brother for monitoring other things on the servers.
Hell I don't know what the differences are and I personaly only have very minimal *nix experiance, but I am hoping that someone might have a solution, script (pearl), patch, upgrade, or anything.

There has to be a variable set somewhere when the wrench light is on that we can query, isn't there????
 
On the front of the server there is a power LED and an LED that has a wrench symbol next to it. Usualy this indicates a power supply failure but can also apparently be a board, memmory, CPU. On server I can access, a 280R running solaris 8, I get:
$ /usr/platform/sun4u/sbin/prtdiag -v
System Configuration: Sun Microsystems sun4u Sun Fire 280R (2 X UltraSPARC-III+)
System clock frequency: 150 MHz
Memory size: 2048 Megabytes

========================= CPUs ===============================================

Run E$ CPU CPU
Brd CPU MHz MB Impl. Mask
--- --- ---- ---- ------- ----
A 0 1200 8.0 US-III+ 11.0
B 1 1200 8.0 US-III+ 11.0

========================= Memory Configuration ===============================

Logical Logical Logical
MC Bank Bank Bank DIMM Interleave Interleaved
Brd ID num size Status Size Factor with
---- --- ---- ------ ----------- ------ ---------- -----------
CA 0 0 1024MB no_status 512MB 2-way 0
CA 0 2 1024MB no_status 512MB 2-way 0

========================= IO Cards =========================




========================= Environmental Status =========================

System Temperatures (Celsius):
------------------------------
cpu0 1
---------
47 46

=================================

Front Status Panel:
-------------------
Keyswitch position: NORMAL

System LED Status: POWER GEN FAULT
[ ON] [OFF]

=================================

Disk Status:
Presence Fault Value
-------- -----------
DISK 0: [PRESENT] [NO_FAULT]
DISK 1: [PRESENT] [NO_FAULT]

=================================

Fan Bank :
----------

Bank Status
---- -------
FAN [NO_FAULT]

=================================

Power Supplies:
---------------
Supply Status PS Type
------ ------ ---------------
PS0 [NO_FAULT] [Sun-Fire-280R]
PS1 [NO_FAULT] [Sun-Fire-280R]

=================================


========================= HW Revisions =======================================

System PROM revisions:
----------------------
OBP 4.5.21 2003/02/24 17:23

IO ASIC revisions:
------------------
Port
Model ID Status Version
-------- ---- ------ -------
Schizo 8 ok 7




The info I need does apear here but I have yet to verify that this section:
Front Status Panel:
-------------------
Keyswitch position: NORMAL

System LED Status: POWER GEN FAULT
[ ON] [OFF]

Is actualy the wrench light and that it acuratly reflects a wrench light.

"Be all and you'll be to end all.
Life can be a real ball.
State of mind!"
 
Code:
/usr/platform/`uname -n`/sbin/prtdiag -v | grep -i fault | grep -iv no_fault

should give you lines with 'fault' on them but not 'no_fault'

you could 'wc -l' on these lines to get a number, and if this number is greater than 0 then there would be a fault on the machine ... possibly. :)
 
How do I use grep to get the line that matches the pattern plus a number of lines directly after? As you can see the status of the LED's on the front panel is on the line after the system LED status line. I suppose if it is always on the same line number I could just check that line for twi instances of [ON]?

Thanks for the help guys, would I be better of asking another forum?

The thought among the more unix savy here (at work) is that there needs to a some firmware / software upgrades before we can do this consistantly across all our platforms.

"Be all and you'll be to end all.
Life can be a real ball.
State of mind!"
 
How do I use grep to get the line that matches the pattern plus a number of lines directly after?
If you hane GNU grep, take a look at the -A option
Otherwise, the sed way:
... | sed -n '/^System LED Status/{;p;n;p;q;}'

Hope This Help, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884
 
Hey PHV, thats cool:
] $ prtdiag -v | sed -n '/^System LED Status/{;p;n;p;q;}'
System LED Status: POWER GEN FAULT
[ ON] [OFF]

Thanks.

"Be all and you'll be to end all.
Life can be a real ball.
State of mind!"
 
So I have been playing a little with this and first off I want to thank you guys for helping a non-programmer with this. How do I script it to send an email or snmp trap?...

xapw@x0319p11[/export/home/xapw] $ prtdiag | grep PS | grep -v 'PS Type'
PS0 [NO_FAULT] [Sun-Fire-280R]
PS1 [NO_FAULT] [Sun-Fire-280R]
xapw@x0319p11[/export/home/xapw] $ prtdiag | grep PS | grep -v 'PS Type' | grep -v NO
xapw@x0319p11[/export/home/xapw] $

This would return and exit status of 0 is a fault were present. I don't know the scripting language but something like:
prtdiag | grep PS | grep -v 'PS Type'| grep -v NO | if exit status = 0 then email or send trap or set MIB

If the system LED status is accurate then:
xapw@x0319p11[/export/home/xapw] $ prtdiag -v | sed -n '/^System LED Status/{;p;n;p;q;}'
System LED Status: POWER GEN FAULT
[ ON] [OFF]
xapw@x0319p11[/export/home/xapw] $ prtdiag -v | sed -n '/^System LED Status/{;p;n;p;q;}' | grep OFF
[ ON] [OFF]
xapw@x0319p11[/export/home/xapw] $

This would return and exit status of 1 if the gen fault light was on. A similar script as above would work.



???


"Be all and you'll be to end all.
Life can be a real ball.
State of mind!"
 
untested ... but might work :)
Code:
FAULT=`prtdiag | grep PS | grep -v 'PS Type' | grep -vc NO`

if [ $FAULT -eq 0 ] ; then
	mailx -s "No PS type Errors on `uname -n`" postmaster <EOM
I don't think I have any PS type errors.
EOM
	echo "send trap or set MIB"
fi

FAULT=`prtdiag -v | nawk '/^System LED Status/{getline; if (index($2,"[ON]") > 0){print "GEN FAULT"}}'`

if [ "x$FAULT" == "xGEN FAULT" ] ; then
	mailx -s "GEN FAULT Error on `uname -n`" postmaster <EOM
I think i Have a General Fault light on.
EOM
exit 1
fi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top