This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alerts for SQL Agent Jobs Missing thier run - due to downtime.

dessalil over 7 years ago

All,

I am new to DPA, hence would appreciate if someone could help.

Consider that we have a SQL Server, which has a lot of jobs scheduled.

Lets say we have a cluster failover or some downtime on this machine due to which some Jobs dint kick off during their scheduled time.

How can we create an alert of report for the same? Alert is preferred.

Thanks,

0 gcp over 7 years ago

You can set up a custom alert such as "Custom SQL Alert - Multiple Numeric Return" using something like this:
SELECT
    JOBS.NAME AS [Job Name]
    ,DATEDIFF(minute, MAX(msdb.DBO.AGENT_DATETIME(RUN_DATE, RUN_TIME)), getdate() )
FROM msdb.dbo.SYSJOBS JOBS
LEFT OUTER JOIN msdb.dbo.SYSJOBHISTORY JH
    ON JOBS.job_id = JH.job_id
        WHERE JH.step_id = 0
GROUP BY JOBS.name
This just tells you the job name and how long (minutes) since the job last ran (not whether it was successful). Set the alert threshold to a suitable value to pick up jobs that haven't run in the relevant period.
This SQL is not fully tested, so please verify the results before you use it in Prod
Cancel
Vote Up 0 Vote Down

Cancel
0 fquintero over 7 years ago in reply to gcp

This Query return count JOB Fail , I did create metric custom
yuo try in dev
select count(*) as JOBFail
from (
SELECT distinct name AS [Job Name]

         ,CASE WHEN enabled=1 THEN 'Enabled'
               ELSE 'Disabled'
          END [Job Status]
         ,CASE WHEN SJH.run_status=0 THEN 'Failed'
                     WHEN SJH.run_status=1 THEN 'Succeeded'
                     WHEN SJH.run_status=2 THEN 'Retry'
                     WHEN SJH.run_status=3 THEN 'Cancelled'
               ELSE 'Unknown'
          END [Job Outcome]
FROM msdb..sysjobhistory SJH
JOIN msdb..sysjobs SJ ON SJH.job_id=sj.job_id
WHERE step_id=0
and SJH.run_status=0
AND    DATEADD(S,
(run_time/10000)*60*60 /* hours */
+((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */
+ (run_time - (run_time/100) * 100) /* secs */,
CONVERT(DATETIME,RTRIM(run_date),113)) >= DATEADD(d,-1,GetDate()) )as s
Cancel
Vote Up 0 Vote Down

Cancel
0 gcp over 7 years ago in reply to fquintero

That does give you a count of failed jobs, provided it is a single step job, or they actually catch success/failure for each step.
If the 'On failure action' is defaulted to 'Go to the next step' then you'll never see an error if a second or subsequent step fails.
I have a vendor (*cough* airwatch *cough*) job that fails in step 9 of 51, but still shows overall as succeeded - because they only check the status of the last step! The standard DPA Administrative Alert for SQL Server Job Failure does actually flag it as failed, which is how I found out.
If I change to "WHERE step_id>=0 " in your SQL it does count the failure, but otherwise it does not.
This also won't help in the original question to determine if a job has NOT run, as that is not a failure.
I got caught a long time ago with a backup job that didn't fail. It was showing as running, but the scheduler had hung and we didn't get a backup for way too long.
After that we added checks for how long since the last successful backup job ran as well.
Cancel
Vote Up 0 Vote Down

Cancel