Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Monitor Server 1

Status
Not open for further replies.

Nostradamus

Technical User
May 3, 2000
419
0
0
SE
I ran sar and found that my Openserver 5.0.5 used 95-100% of %usr.
Gosh! I thought to myself, and also ran cpqmon (nifty tool from compaq that monitors cpu/ida/disc load and more)
It showed that the cpu was doing way more then it should. It was constantly on 95-100% and had been for several days. I try to check the server's condition daily but my primary feedback are the users complaints.

These are my questions?
When I ran ps -ef to look at the process, this is what it listed as cpu-time. 09-00:22:59 What does it mean? That it's been using cpu for 9 days and 22:59 minutes?

Are there any good ways to notify me if such things were to happen again? Some automatic (script?) way to gather system information and alert me. Perhaps a mail to root when the server is overloaded.
We use snmpd on the machine to (simply) listen on link up/down traps. Do I have to configure something on the server to report disk usage, cpu load and such?
We use snmpc to gather information and email/page the tech crew when servers go up or down. I'm not to familiar with snmp or the program itself but I haven't found where I can gather hardware information using it. What is needed to do so?

any input is welcome.
thanks in advance. /Sören
 
Nostradamos,

Yes, the TIME of 09-00:22:59 means the
process has been running for
9 days, 22 minutes, 59 seconds.
You didnt mention what the process was
however, so we cannot give you advice on
what might be causing it. In general,
processes shouldnt run that long unless they
are thrashing for some reason. There are
of course exceptions, such as the X-server.
It is not uncommon for your X-server to
show times like this.

There is no pre-built utility to
monitor and alert you based on things like
cpu time. You would have to write a
shell script to look for run-away
processes, and do something appropriate
like fire off a uucico to page someone
or call a modem, etc...

Enjoy

Caldera Support
 
The process that stole cpu was part of our mumps-database. It's therefor no ordinary SCO process. Haven't installed X-windows and probably never will. Anyway the problem has been corrected.

Rather annoying though, that it's been stealing cpu-time for 9 days without me knowing it. A monitor utility would be in handy.

I've written VERY simple scripts to monitor users and such. How would a cpu-monitor shell-script look like?
I want it to mail root whenever it reach 90% load for more then 3 hours or similar.

any tips on this or snmp monitoring would be helpful. /Sören
 
What does the output of sar look like on SCO? I work primarily on AIX and haven't been on a SCO box in years but will be glad to setup a script (or alteast get you started).
 
This could be the sar output. The logs usually continue (at least they do for me) at 20 minutes intervals after 08:00 (since users usually drop in by then).

# sar

SCO_SV server 3.2v5.0.5 i80386 12/20/2001

00:00:00 %usr %sys %wio %idle (-u)
01:00:01 0 2 3 95
02:00:00 0 2 2 95
03:00:00 0 1 2 97
04:00:00 0 0 0 100

please let me know if you need anything else...
also explain excactly how you do what you do. I really want to learn shell-scripting, but I usually fails :) /Sören
 
I believe this will work for you:

---------------Begin Script---------------

#!/bin/ksh

usage=$(sar 1 | awk '{print $2}' | egrep -v "`hostname`|usr")
curtime=`date +%H`

case ${usage} in
90|91|92|93|94|95|96|97|98|99|100)
if [ -s /tmp/usagemon.out ]
then
origtime=`cat /tmp/usagemon.out`
((totaltime=curtime-origtime))
if [ "${totaltime}" -ge 3 ]
then
echo "WARNING: CPU usage has reached ${usage} for `hostname`!" | mailx root@domain.com
fi
else
echo ${curtime} > /tmp/usagemon.out
fi
esac

---------------End Script---------------

Here is what's happening:

First I set the output of sar for the cpu usage to a variable called usage.

usage=$(sar 1 | awk '{print $2}' | egrep -v "`hostname`|usr")

next I set the current hour in a variable called curtime which will be used for calculating how long cpu usage has been at or above 90%.

curtime=`date +%H`

then I take the value of usage and use it in a case statement. From here we then check to see if a file, /tmp/usagemon.out - (purely arbitrary), exists.
If not, it is created by echoing the value of curtime variable into it.

If the file is there, it takes the value of what's in it and sets that to origtime variable and uses if for calculation purposes.

If the totaltime value is equal or greater than 3 (representing 3 hours) then a message is sent to root.

origtime=`cat /tmp/usagemon.out`
((totaltime=curtime-origtime))
if [ "${totaltime}" -ge 3 ]
then
echo "WARNING: CPU usage has reached ${usage} for `hostname`!" | mailx root@domain.com
fi

This should be run as a cron job however often you needed it to check for cpu usage.

Hope this helps!


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top