Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to automatically monitor for hardware errors?

Status
Not open for further replies.

gcux

MIS
Apr 23, 2001
6
US
We need to be notified via e-mail or pager when hardware is having problems. How have other admins setup the `errpt` command or other commands to monitor hardware issues?
Thanks.
 
Monitor your errpt with a errpt -c -s. awk out print $1 compare the awked out identifier to a list of identifer # or hardware error you want to monitor if condition exist you can e-mail, page, or logger a console message to your operators, etc...... Learn something everyday or you wasted a day of your life.
 
Hi
you can add your own entry in the ODM class "errnotify" and with the method parameter you can trigger your own script (write a log , send an e-mail etc). A single error can trigger more than one entry.

1) odmget errnotify > sample
2) edit sample
keep the best suited entry and remove the other entries
edit the en_name with a choosen name of your own and the en_method with a script of your own like this :

errnotify:
en_pid = 0
en_name = "My_Disk_Error"
en_persistenceflg = 1
en_label = ""
en_crcid = 0
en_class = ""
en_type = ""
en_alertflg = ""
en_resource = ""
en_rtype = ""
en_rclass = "disk"
en_symptom = ""
en_method = "/usr/local/bin/disk_ntfymgr.sh $6"

with $6 you can give the resource name as parameter to your script

3) odmadd sample

4) write the script which can look like this :

#!/bin/ksh

echo 'see details in AIX errorlog' > /usr/local/log/disk_error.log.$$
echo 'Resource ' $1 >> /usr/local/log/disk_error.log.$$

/usr/bin/mail -s &quot;Disk Error&quot; -c &quot;ccuser1@domain1 &quot; &quot;sysadmin@domain2&quot; </usr/local/log/disk_error.log.$$

rm /usr/local/log/disk_error.log.$$



 
By far the best thing to do (though I'm sure that others will disagree with me) is to buy a product like HP's OpenView or download a product like Big Brother or Bog Sister. Having a framework in which to manage your systems will make it easier in the long run. Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
Thank you all for your assistance.

LHCB: Thanks for pointing out the &quot;-c&quot; option. I tried errpt -c -s and the -s switch is not available for 4.3.2, maybe it is available with 4.3.3.

I'm going to spend some more time with raylin's ideas for working with the odm classes. That looks like the kind of solution we need here.

MikeLacey: I agree with the need for a framework. At my current client site they use BMC Patrol. I will see about integrating this custom solution into Patrol so they can get notified through that system also.

 
BMC Patrol is a fine product, I'm sure that you will be able to integrate the solution you choose so that alerts are handled in a structured way. Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top