Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

crontab triggering email from script in error 2

Status
Not open for further replies.

zootweller

Technical User
Oct 7, 2001
46
GB
The following script sends an email if certain processes have failed, but sends no email if no error is reported.
This works great when ran manually, but for some reason when I add this to run under crontab it will trigger the email event anyway - sending it to the recipient address (not local root).

crontab entry: 5,20,35,50 * * * *
/<location>/healthcheck.ksh > /dev/null 2>&1

Script:
#!/bin/ksh
#
MACHINE=$(uname -n)
# Count iPlanet & SiteMinder processes
IPLANETPROCS=$(ps -ef|grep iplanetws|grep -v grep|wc -l)
SMPROCS=$(ps -ef|egrep "smservauth|smservaz|smservacct|smservadm"|grep -v 'grep'|wc -l)
# SETUP THE FIRST 2 LINES OF THE MAIN REPORT
#
echo "---------------------- Error Alert ---------------------: " > errorreport
echo " " >> errorreport
#
# APPEND TO REPORT IF IPLANET NOT RUNNING
if [ $IPLANETPROCS -lt 1 ]
then
echo "iPlanet web service has failed on $MACHINE" >> errorreport
echo " " >> errorreport
echo " " >> errorreport
fi
#
# ADD TO REPORT IF ANY OF THE 4 SM PROCESSES HAVE FAILED
if [ $SMPROCS -ne 4 ]
then
echo "SiteMinder Process failure on $MACHINE" >> errorreport
echo " " >> errorreport.$DTSTAMP
echo " " >> errorreport.$DTSTAMP
fi
#
# EMAIL THE MAIN REPORT ONLY IF CONTENT HAS BEEN ADDED BEYOND THE FIRST 2 LINES
CHECKMSG=$(more errorreport.$DTSTAMP|wc -l)
if [ $CHECKMSG -gt 2 ]
then
cat errorreport|mailx -r <sender@test.com> -s "Error report - $MACHINE" <recipient@test.com>
fi

Any help will be most welcome!
 
I'm not sure whether this will apply here, but cron runs with only a (very) limited environment and often things like the PATH variable and suchlike are incomplete or missing altogether.

In the event that this is the problem, you could identify the PATH set when running interactively (echo $PATH) and then include this near the top of your script with an export PATH=<the paths identified> then try running it through cron again. There may be other variables that need to be set in the same way - env might tell you what they could be. Hope this helps. If not, post back.
 
Hi

First I would look at the errorreport file, to see if it looks different when runned from [tt]cron[/tt] from command line. A problem could be the rights on directory where the file is created. There is no absolute path, so no idea where it will be created when runned from [tt]cron[/tt].

Some ideas to make it more clear, so easier to debug :
[ul]
[li][tt]grep[/tt] has a -c option to output the count of matching lines, so no need for [tt]wc[/tt][/li]
[li]look at the [tt]pgrep[/tt] command, may be easyer then [tt]ps[/tt] and [tt]grep[/tt][/li]
[li]no need for [tt]cat[/tt], redirect the input for [tt]mailx[/tt] from that file with less-then ( < ) sign[/li]
[/ul]

Feherke.
 
Please ignore the '.$DTSTAMP' part. I stripped this down to its simple components for clarity on this post..
 
Adding the PATH var unfortunately doesn't seem to change things. Like I say, it runs fine manually. Crontab is under root, as is the script.
 
Can you output the value of $CHECKMSG in the script so that you can see what it is when run interactively and via cron and compare the two.
 
modify your crontab entry (or add an entry for one run), so that you can see what is going wrong like so:

5,20,35,50 * * * * /<location>/healthcheck.ksh >/tmp/hc.log 2>&1

Then after the script has run, examine the stdout and stderr in the log file it should have created.

Feherke hint about "no absolute path" is a place to start looking where things can go wrong.

put a

cd /whatever/location

at the beginning of the script or provide a full path name for your errorreport file.

Easiest way to do this is:

ERRREP=/full/path/to/errorreport.${DTSTAMP}

Then use:

echo "whatever" >${ERRREP} (first echo)
echo "some more stuff" >>${ERRREP} (all subsequent echo's to the errorreport file)


HTH,

p5wizard
 
I'd change this line:
CHECKMSG=$(more errorreport.$DTSTAMP|wc -l)

To this:
CHECKMSG=$(cat errorreport.$DTSTAMP|wc -l)

Or:
CHECKMSG=$(wc -l errorreport.$DTSTAMP)

The "more" command tries to use your TERM environment to control paging. When run from CRON, it is probably adding some kind of warning, thereby increasing the number of lines found by the "wc -l" command.

Just a hunch.

 
I don't think I have the complete answer for you but I have some ideas...

When you run something from cron, its output is sent in a email to the account that's running in a cron job. This isn't the unexpected email you're getting, is it? I usually use the grave accent (`) for capturing a program's output, but I suspect that $() works the same. The standard error output of your commands ([tt]ps[/tt], [tt]grep[/tt], etc) should go in an email to the account with the cron job. At the end of the command in your list of cron jobs, add [tt] > /dev/null 2>&1[/tt].

I recommend using [tt]cat[/tt] instead of [tt]more[/tt] for the [tt]CHECKMSG[/tt] variable. It's possible that [tt]more[/tt] acts differently if it's not called from an interactive session.

You know that the only way you get the email is if there are more than two newlines in the errorreport. So, use this info for troubleshooting. Blank lines count, too, so you might have some blank lines that you can't see in the email message. Try putting something in the file that isn't blank after you count the lines, but before you send the message so that you can see how many blank lines there are. I like to use the pipe symbol for this purpose.

Counting the lines might be problematic in itself. Try [tt]grep[/tt]ping for the word fail instead. Then you only have to check if it's greater than 0.

Please consider my alternative below. Most of my changes are just my style of coding. I just wanted to offer an alternative ...
Code:
#!/bin/ksh
MACHINE=`uname -n`
DTSTAMP=`date +"%d%b%Y_%H%M%S"`

# Count iPlanet & SiteMinder processes
IPLANETPROCS=0
SMPROCS=0
ps -ef | while read PROCLINE; do
    [ $(grep iplanetws | wc -l) -gt 1 ] && IPLANETPROCS=$((IPLANETPROCS + 1))
    [ $(egrep 'smservauth|smservaz|smservacct|smservadm' | wc -l) -gt 1 ] &&
        SMPROCS=$((SMPROCS + 1))
done

# SETUP THE FIRST 2 LINES OF THE MAIN REPORT
echo "---------------------- Error Alert ---------------------: \n" > errorreport

# APPEND TO REPORT IF IPLANET NOT RUNNING
if [ $IPLANETPROCS -lt 1 ]; then
    echo "iPlanet web service has failed on $MACHINE\n\n" >> errorreport
fi

# ADD TO REPORT IF ANY OF THE 4 SM PROCESSES HAVE FAILED
if [ $SMPROCS -ne 4 ]; then
    echo "SiteMinder Process failure on $MACHINE\n\n" >> errorreport
fi

# EMAIL THE MAIN REPORT ONLY IF the error report has the word "fail" in it
CHECKMSG=`grep fail errorreport | wc -l`
if [ $CHECKMSG -gt 0 ]; then
    cat errorreport | mailx \
        -r <sender@test.com> \
        -s "Error report - $MACHINE" \
        <recipient@test.com>
    mv errorreport errorreport_$DTSTAMP.txt
fi

The [tt][ test ] && [ action ][/tt] is a shortcut to [tt]
if [ test ]
then
action
fi[/tt]

I try to keep my code within 80 characters wide. For me, it's easier to read that way.

--
-- Ghodmode
 
Yes - you're absolutely right! CHECKMSG was returning 5 instead of 2 - changing it to cat seems to have solved it.

Many thanks!
 
One of the first things I checked before answering. But strangely enough, "more" does not show that behaviour while being run from cron/at/batch on AIX, it just silently runs like cat...


HTH,

p5wizard
 
I completely agree - motoslide should get credit for this. Thanks all who replied.
 
Thanks, folks. There's a reason my name isn't often on the list of "Top helpers" in this forum. Many others have more experience and broader knowledge. That's why I said it was "a hunch". One of the strengths of forums such as these is that the questions get reviewed by many different people with different past experiences. I've been recently playing with the options to "more" and suspected it wouldn't be happy being run without a TERM variable.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top