Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Rogue Process Reporting 2

Status
Not open for further replies.

gringomike

Technical User
Aug 6, 2003
148
GB
Hi all,

I'm looking for a script that will report on rogue processes.
I have 10 jobs (job1, job2 job3 etc) that login and logout together at various times of the day. The problem is that they don't always logout as expected and this causes problems for future processing.
These jobs are started by running a script (start_jobs.ksh) and they write a lock file to a directory (/processing/jobs/lock) to prevent multiple instances of the same job from running at the same time.

I need to be able to run a script that will report which of these jobs are still running when they shouldn't be and which jobs have left a lock file behind when they shouldn't have.

I have the following written so far but I still think I'm some way from finding the final solution.

LOGGED_IN=`ptree $JOB |grep start_jobs.ksh`
JOB_INFO=`ps -ef |grep JOB |awk '{print $2, $11}'`
#JOB_NAME=`ps -ef |grep JOB |awk '{print $11}'`
#JOB_ID=`ps -ef |grep JOB | awk '{print $2}'`

for JOB in `ls -1 /processing/jobs/lock`
do
echo "JOB_INFO"

if [ $LOGGED_IN ]
then
echo "$JOB is logged in as expected"
else
echo "$JOB is a rogue process and should be terminated. Please remove the lock file also"

fi

done


If any of this doesn't make sense then i'll try to explain things some more!

Thanks for your help in advance!

GM

 
Try this :
[tt]
for JOB in `ls -1 /processing/jobs/lock`
do
echo "$JOB_INFO"
if ptree $JOB |grep -c start_jobs.ksh
then
echo "$JOB is logged in as expected"
else
echo "$JOB is a rogue process and should be terminated. Please remove the lock file also"
fi
done
[/tt]

Jean Pierre.
 
Great - Thanks Jean Pierre!

I wasn't so far off after all!

GM
 
Bad news!

I'm afraid I was too hasty when I said this was a solution to the problem:-( It works fine when the jobs are logged out (and rogue sessions are running) but when they are logged in it reports the following information;

29085 SLAVEI
29086 SLAVEX
29078 SLAVEM
29081 SLAVEQ
29083 SLAVE2
29080 SLAVEP
29076 SLAVEK
29084 SLAVEB
29077 SLAVEL
29079 SLAVEN
29082 SLAVER
ptree: cannot find JOB0 passwd entry
JOB0 is a rogue process. This process should be terminated and the lock file should be removed.
29085 JOBI
29086 JOBX
29078 JOBM
29081 JOBQ
29083 JOB2
29080 JOBP
29076 JOBK
29084 JOBB
29077 JOBL
29079 JOBN
29082 JOBR
ptree: cannot find JOB1 passwd entry
JOB1 is a rogue process. This process should be terminated and the lock file should be removed.
29085 JOBI
29086 JOBX
29078 JOBM
29081 JOBQ
29083 JOB2
29080 JOBP
29076 JOBK
29084 JOBB
29077 JOBL
29079 JOBN
29082 JOBR
ptree: cannot find JOB2 passwd entry
JOB2 is a rogue process. This process should be terminated and the lock file should be removed.
29085 JOBI
29086 JOBX
29078 JOBM
29081 JOBQ
...........etc etc - Effectively reporting back that every "JOB" is a rogue session!

Any other ideas?

Thanks!

GM
 
sorry - the first column;

29085 SLAVEI
29086 SLAVEX
29078 SLAVEM
29081 SLAVEQ
29083 SLAVE2
29080 SLAVEP
29076 SLAVEK
29084 SLAVEB
29077 SLAVEL
29079 SLAVEN
29082 SLAVER

Should read;

29085 JOBI
29086 JOBX
29078 JOBM
29081 JOBQ
29083 JOB2
29080 JOBP
29076 JOBK
29084 JOBB
29077 JOBL
29079 JOBN
29082 JOBR


Thanks!
 
Try this modification
[tt]
if ptree $JOB 2>&1 |grep -c start_jobs.ksh
[/tt]

Jean Pierre.
 
Thanks for the reply!

This only seems to check if the lock file is present and if so, reports that the job is logged in as expected. I need the script to check for the process and the lock file. If either is not present I would like the script to report it as a problem.

It's looking better though! :)

Any other ideas?

Thanks

GM
 
I never use ptree (not avalaible on my system).
Please give me the output ans status of a ptree command
- for and existing job
- for an unknown job

ptree job
echo $?

Jean Pierre.
 
Another approach:
tmp=/tmp/job.$$
ps -ef | grep start_jobs.ksh | awk '{print $11}' | sort >$tmp.ps
ls -1 /processing/jobs/lock >$tmp.ls
echo "Jobs running without lock file:"
comm -23 $tmp.ps $tmp.ls
echo "Lock files without running job:"
comm -13 $tmp.ps $tmp.ls
echo "Jobs logged as expected:"
comm -12 $tmp.ps $tmp.ls

Hope This Help
PH.
 
Hi PH,

Good approach, but I am not sure that the process running 'start_jobs.ksh' is the process which creates the lock file.
If it is a child process your script doesn't work but the approach is to be keep.
The $temp.ps file must be build from 'ptree' results.

Jean Pierre.
 
Salut Jean-Pierre.
I, like you, don't have access to ptree command.
Still waiting for more info from gringomike...
 
Hi,

Thanks for the replies!

The ptree command displays the following output if it is passed the PID of an active job;

# /usr/bin/ptree <PID_of_JOB1>

29480 <parent_path>/p_ctmag -e /<parent_path>/ctm -i INETD
29481 /bin/ksh -x <path_to_script>/start_jobs.ksh
29502 <path_to_expect>/expect -f <path_to_job>/job_logon.exp JOB1
29514 ksh
29535 /bin/ksh /<child_path>/run_script <region>


If it is passed the PID of a rogue job it displays the following output;

# /usr/bin/ptree <PID_of_rogue_JOB1>

29502 <path_to_expect>/expect -f <path_to_job>/job_logon.exp JOB1
29514 ksh


I'm using &quot;ptree&quot; to make sure each running &quot;JOB&quot; has the &quot;parent&quot; process &quot;start_jobs.ksh&quot; running. If not it is a rogue and I would like the script to report it as such. It is each individual &quot;JOB&quot; that writes the lock file to the &quot;/processing/jobs/lock&quot; directory.

I hope this is a clear explanation!

Thanks!

GM
 
Try this untested script (based on ps command, 'job_logon.exp' allways child of start_jobs.ksh' assumed):
[tt]
tmp=/tmp/job.$$
ps -ef | awk '
/\/start_jobs.ksh/ { start_pids[$2]=&quot;&quot; }
/\/job_login.exp/ { if ($3 in start_pids) print $NF }
' | sort > $tmp.ps

ls -1 /processing/jobs/lock >$tmp.ls
echo &quot;Jobs running without lock file:&quot;
comm -23 $tmp.ps $tmp.ls
echo &quot;Lock files without running job:&quot;
comm -13 $tmp.ps $tmp.ls
echo &quot;Jobs logged as expected:&quot;
comm -12 $tmp.ps $tmp.ls

rm -f $tmp.*
[/tt]

From ps result :
$2 = pid
$3 = parent pid

Jean Pierre.
 
Thanks alot!

The examples from you both work just as I would have hoped!

The only modification I might need would be to allow the script to handle lock files in multiple locations. e.g. Not all lock files are in /processing/jobs/lock, they might also be in /processing/jobs_new/lock, /processing/jobs_running/lock, /processing/jobs_complete/lock etc.....

Could I incorporate a &quot;for loop&quot; to read a list of lock file locations and report which jobs are logged into which location?

Perhaps a modification of PHV's script, something like this;

LOCATION=`cat /processing/location_list.txt`

for JOB_LOC in $LOCATION
do
tmp=/tmp/job.$$
ps -ef | grep start_jobs.ksh | awk '{print $11}' | sort >$tmp.ps
ls -1 /processing/$JOB_LOC/lock >$tmp.ls
echo &quot;Jobs running without lock file:&quot;
comm -23 $tmp.ps $tmp.ls
echo &quot;Lock files without running job:&quot;
comm -13 $tmp.ps $tmp.ls
echo &quot;Jobs logged as expected:&quot;
comm -12 $tmp.ps $tmp.ls
done


Is there a better way to do this e.g. to read a list of lock file locations from memory rather than a text file?

Thanks for all your help!

GM
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top