Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

stored proc to check for hanging jobs and send email

Status
Not open for further replies.

mflancour

MIS
Apr 23, 2002
379
US
Any idea's on how to do this?
 
I use a tool called SQL Diagnostics by Idera to do that. It can also send me email to report blocking, High CPU usage to just name a few. It wasn't that expensive either.
If you can't look at 3rd party tools you could enter the data from sp_who2 into a temp table and look for suspended processes. Then send an e-mail if you find any.

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
Thank you....Can you elaberate a bit on the second option?
 
sure, what verion of SQL server are you using?

You would want something like this in a procedure.

Code:
CREATE table #tmpUsers(
spid int,
STATUS varchar(50),
loginname varchar(50),
hostname varchar(50),
blk varchar(10),
dbname varchar(50),
cmd varchar(30),
CPUTIME varchar(255),
DISKIO varchar(255),
LASTBATCH Varchar(30),
ProgramName Varchar(255),
SPID2 int,
requestid int)

INSERT INTO #tmpUsers EXEC SP_WHO2
SELECT * FROM #tmpUsers
DROP TABLE #tmpUsers
Then you can query the temp table for what every you wanted.
High CPU Time, blocking, SPIDS in a suspended status.
You can also use this function to get the SQL statement of the problem SPID.
Code:
declare @handle binary(20)
select @handle = sql_handle 
from master.dbo.sysprocesses where spid = [b]YOUR SPID HERE[/b]
select * from ::fn_get_sql(@handle)

If this is for SQL 2005 you can email your results with
Code:
EXEC msdb.dbo.sp_send_dbmail

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
Cool...

To make sure I understand, at what point will the Status be 'suspended'? I mean, how would it know if a job is hanging? Just by run duration or some error? If the sp for a job is stuck in some sort of infinite loop, how would this be reflected?

I ran across the following which, from what I can understand, I might be able to use based on average job run time. (ie. if latest record for a job is currently running and the run time is greater than 200% of average run time, send an e-mail)...although the table currently only holds one day of job history, so might be a moot point. (any way to have it retain a set number of days worth of history?)(is this even on track with checking for hanging jobs?) Sorry, not real clear on a lot of things.

SELECT Job_ID = j.Job_ID,
Job_Status = h.Run_Status,
Run_Date = h.Run_Date,
Run_Time = h.Run_Time,
Job_Name = convert(char(80),j.Name),
Step_Name = h.Step_Name
FROM MSDB.dbo.SysJobHistory AS h
INNER JOIN MSDB.dbo.SysJobs AS j ON (h.Job_ID = j.Job_ID)
INNER JOIN MSDB.dbo.SysCategories AS c ON (j.Category_ID = c.Category_ID)
--WHERE h.Run_Status = 4
 
A job could be suspended for any number of reasons. You can find out the reason for the job being suspended from the last wait type field in the sysprocesses table. You would need the SPID for the JOB. I like to use the CPU TIME though. I also use run time is greater than 100%.

- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
Lost you a bit on that last post...

How would sql server know to list a job as suspended based on time if there's no real error?

How do I get a job name from the spid or some other field in these tables?

what is wait type? The definition is simply "reserved"

When you say you use 100%... what are you comparing?

I apologize for the simplicity of these questions, but something isn't quit clicking.
 
How would sql server know to list a job as suspended based on time if there's no real error?
Any active process can become suspended for any number of reasons. Most common is it is being blocked.

How do I get a job name from the spid or some other field in these tables?
This part is not easy because the ProgramName will show up as SQLAgent. I've always had to use profiler for that step.

what is wait type? The definition is simply "reserved"
Wait type is a field in sysprocesses that tells you the reason you spid has been suspended.

When you say you use 100%... what are you comparing?
I always compare to the last run time.

I hope this info helps. As you can see it can be very difficult to find hung jobs or processes. That's why I bought SQL Diagnostics.


- Paul
- If at first you don't succeed, find out if the loser gets anything.
 
I knew someone (I haven't done it myself) who created a job that monitored all jobs. It created a table to keep all jobs in and had an Average Time column for each job. If the job went over or under that Average time (based on the total time of each job first 3 runs added together and divided by 3), then it emailed the DBA team with an alert. The job runs every 15 minutes or so on the Production box and apparently works well.

He pulled columns from the sysjobs, sysjobhistory and sysjobschedules to get it to work.



Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top