Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Monitor & Restart Process

Status
Not open for further replies.

mrn

MIS
Apr 27, 2001
3,993
GB
Hello all,

I already have a solution but wondered if anyone had any slicker methods to do the following.

1. monitor process
2. if process stopped restart
3. if restart fails 3 times, mail admin

My Solution - **Not tested**

#--------
count=0

ps -ef|grep -v grep|grep procname > /tmp/procname

if [ -s /tmp/procname ] ; then
echo "It's running"
else
nohup procname &
fi

while [ $count -le "3" ]
do
rm /tmp/procname
sleep 300
ps -ef|grep -v grep|grep procname > /tmp/procname
if [ -s /tmp/procname ] ; then
echo "It's running"
count=3
else
nohup procname &
count=`expr $count + 1`
fi
done
#-------

Ideas?



--
| Mike Nixon
| Unix Admin
|
----------------------------
 
P.S I know I missed off the mail bit & it would run from crontab.

--
| Mike Nixon
| Unix Admin
|
----------------------------
 
Mike, Doing this the way you have defined will make the process run forever. You may wish to consider running a single process through cron every "X" minutes.

#!/bin/ksh
#
PID=" " # initialization
count=0
while [ $count -le "3" ]
do
PID=`ps -elf | grep procname | awk 'print { $4 }'`
if [ -z "$PID" ] ; then
# Process is down
nohup procname &
count=$(($count+1))
else
# Process is up
count=0
fi
sleep 300
done
print "Failed restart three times sending email..."

Note: Process is up, down, up, down, up, down, up, down...you will not get an email.

cron version:
#!/bin/ksh
#
PID=" " # initialization
count=0
while [ $count -le "3" ]
do
PID=`ps -elf | grep procname | awk 'print { $4 }'`
if [ -z "$PID" ] ; then
# Process is down
nohup procname &
count=$(($count+1))
else
# Process is up
sleep 60 #set high enough for process to start
fi
done
if [ $count -gt "3" ]; then
print "Failed restart three times sending email..."
exit 99
fi
exit 0


Note: I did not get a chance to test this after playing with it. Should be very close if not 100%, let me know.
 
The process has to run forever, it itself monitors a SAP archive directory, which in turn archives to tape. It archive fills up SAP stops and we have unhappy people jumping up and down.

Thanks for the reply :)

--
| Mike Nixon
| Unix Admin
|
----------------------------
 
Mike,
If the process can sleep at all, then you have a time when the application could be down and the process will not detect it. If we check at minute 1 and it is up, and then it fails 1 second later, we will not check again for 9 more minutes. In addition, if the monitor were to die for some reason, it would not be there to detect the failure next time.

Putting the process into cron, if for some reason the server is rebooted or the monitor died, it would run again in "X" minutes.

Cheers,
--Mike

 
Hmm! What am I missing here? If this process is so unstable it needs monitoring to the extent described above, surely there is a fundimental problem with the process (that needs resolving) and just restarting it probably is not the real answer?

I too use process monitor scripts that restart a process if it dies but (maybe I'm lucky) it is very rare that they ever need to restart, they are more of a comfort factor and I usually add a log entry so I capture it if a restart occurs, otherwise you only know when the fatal day comes when a restart fails (3rd time).

Is the process likly to fail because because a disk fills to 100% ? if so it may be better to monitor disk space as a preventitive measure.

Now if you realy want to make things water tight tage a look at Nagios (was Netsaint) Open source monitoring with restarts, remote calls, etc ...

Good Luck

Laurie.
 
No the process is pretty stable, apart from when we are on a training course or at lunch......... You get the Idea.

But we're talking about a multi-million £ system, and it seems prudent to make sure this process is running at all times. If the process fails the SAP/oracle archiver will fill the disk and it will fall over, something that happened last week.

Thanks for your input Laurie

--
| Mike Nixon
| Unix Admin
|
----------------------------
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top