Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Predictive Failure Software

Status
Not open for further replies.

beardyboy

Technical User
Jul 26, 2002
19
0
0
GB
Anyone know of an application that can be run on an AIX machine that can predict/report failing hardware before it happens (not really looking for a geenie in a bottle just a bit of a headstart/prewarning). It needs to be able to send the info from the machine via e-mail or even txt the engineer on their mobile phones.
 
HI,

I think somebody is looking for a bit of prophecy here :)

Normally you know on any HW issues developing only when periodically checking the errpt.
I wrote some script that looks for FAN,MIRROR disk and PS problems and informs the customer with a yellow screen once an hour .
You can add an email message with the warning message sent to one of your engineers mobile number.
Here is the script.
===================================
#!/bin/ksh
#
# --------------------------------------------------------------------------------
# Written by: Lev Weissman
# Version 2.0
# Date: July 16, 2002
#
# Function: Checks for FAN, SYSTEM DISK,PS failures in 6F1
# Displays a warning in a new aixterm every hour, and sends mail to “backup” user.
# Once activated the first time – the script programs the cron to run this script every hour.
# ----------------------------------------------------------------------------------

TITLE="FAILURE"
BG=Yellow
CR=Yellow # Make cursor same color as BG so it's invisible
FG=Red # Text Color
ROWS=100
FAILURE=0
STRING="FAILURE !!!"
STRINGDISK=""
STRINGPS=""
STRINGFAN=""
STRINGCALL="Please call Creo support !"

###### main ######
#add crontab entry
crontab –l |grep “check-redundancy.sh” > /dev/null
if [[ $? != 0 ]] ;then
crontab –l > /tmp/crontab1
echo “0 * * * * /scitex/version/scripts/check-redundancy.sh” >> /tmp/crontab1
crontab /tmp/crontab1
fi

#Check what errors exist in errpt:
if errpt |grep STALE > /dev/null;then
STRINGDISK="One of the system disks has failed !"
FAILURE=1
fi
if crontab –l |grep “cooling problem” > /dev/null;then
STRINGFAN="One of the FANs has failed !"
FAILURE=1
fi
if crontab –l |grep “power problem” > /dev/null;then
STRINGPS="The power supply has failed !"
FAILURE=1
fi

if [[ $FAILURE = 1 ]] ;then
#Send mail to “backup” :
echo “$STRING\n\n$STRINGDISK\n\n$STRINGPS\n\n$STRINGFAN\n\n$STRINGCALL" > /tmp/error
/usr/bin/mail –s “System failures” backup < /tmp/error

#Notify MAC users.
/scitex/version/usr/local/es/etc/afpmsg “$STRING $STRINGDISK $STRINGPS $STRINGFAN $STRINGCALL&quot;

#Send long beep to the bell
sh -c &quot;echo \\a\\a\\a\\a\\a\\a\\a\\a\\a\\a\\a\\a&quot;

#This is the display part:
# Create temp script:
echo &quot;#!/bin/sh \necho \`date +\&quot;%T\&quot;\` \\nbanner \&quot;$STRING\&quot; \\necho \&quot;$STRINGDISK\&quot; \\necho \&quot;$STRINGPS\&quot; \\necho \&quot;$STRINGFAN\&quot; \\necho \”$STRINGCALL\” \\nread JUNK&quot; > /scitex/version/scripts/msg_tmp
chmod 777 /scitex/version/scripts/msg_tmp

# Kill existing dialog
kill -9 `ps -ef | grep &quot;sh -c /scitex/version/scripts/msg_tmp&quot; |grep -v grep |awk '{print $2}'` > /dev/null 2>&1

# Execute in window
TERM_TYPE=`echo $TERM`
if ps –ef|grep xinit |grep –v grep > /dev/null ;then
aixterm -T &quot;$TITLE&quot; -bg $BG -fg $FG -cr $CR -geometry 80x$ROWS+0+0 -e /bin/sh -c /scitex/version/scripts/msg_tmp & #> /dev/null 2>&1 # Hide errors in case font missing
else #vt term
echo &quot;echo \&quot;\n\n\&quot;&quot; > msg_tmp.vt
grep -v read /scitex/version/scripts/msg_tmp >> msg_tmp.vt 2>/dev/null # lose the line that pauses for input
echo &quot;echo \&quot;\n\n\&quot;&quot; >> msg_tmp.vt
chmod 777 /scitex/version/scripts/msg_tmp.vt
/scitex/version/scripts/msg_tmp.vt
fi
fi
&quot;Long live king Moshiach !&quot;
h
 
Cheers for that :).


I've look at lots of software from many developers, but its mainly keyed to determining when filesystems are almost full, memory usage is very high etc etc.

I didn't think anything existed hardware wise, i've seen similar software used outside of this arena regarding when part are operating outside of expected limits/criteria (i.e when testing cars etc etc), at least that will stop my boss going on about it daily :)
 
Hi,

IBM do a product called Service Agent which runs on P series and Rs 6000 servers. It reports hardware errors deetected on the system ( depends how it is set up) to IBM without any user intervention.

Basically it is a bit like predict , I know you need a modem on your machine and service agent software installed .
If your server is a new server i.e. p series , it is free and you can get IBM to install and configure it .
I &quot;think&quot; there is capability of sending logs to IBM via the Web.

If you need pdf file detailing Service Agent then give me your email , i don't know if you can attach files to this thread.

HTH
 
Enjoy it.

(Change the directories as per your own system.)
Also,you can disregard the parts related to &quot;backup&quot; and MACs.
You can add othe errors in errpt that grep will surch for and notify. &quot;Long live king Moshiach !&quot;
h
 
Yeah that would be great thanks.


e-mail: b.chandler@total-cover.net



Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top