This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alerts for SQL Agent Jobs Missing thier run - due to downtime.

All,

I am new to DPA, hence would appreciate if someone could help.

Consider that we have a SQL Server, which has a lot of jobs scheduled.

Lets say we have a cluster failover or some downtime on this machine due to which some Jobs dint kick off during their scheduled time.

How can we create an alert of report for the same? Alert is preferred.

Thanks,

SD

  • You can set up a custom alert such as "Custom SQL Alert - Multiple Numeric Return" using something like this:

    SELECT

        JOBS.NAME AS [Job Name]

        ,DATEDIFF(minute, MAX(msdb.DBO.AGENT_DATETIME(RUN_DATE, RUN_TIME)), getdate() )

    FROM msdb.dbo.SYSJOBS JOBS

    LEFT OUTER JOIN msdb.dbo.SYSJOBHISTORY JH

        ON JOBS.job_id = JH.job_id

            WHERE JH.step_id = 0

    GROUP BY JOBS.name

    This just tells you the job name and how long (minutes) since the job last ran (not whether it was successful). Set the alert threshold to a suitable value to pick up jobs that haven't run in the relevant period.

    This SQL is not fully tested, so please verify the results before you use it in Prod emoticons_happy.png

  • This Query return count JOB Fail , I did create metric custom

    yuo try in dev

    select count(*) as JOBFail

    from (

    SELECT  distinct name AS [Job Name]

           

             ,CASE WHEN enabled=1 THEN 'Enabled' 

                   ELSE 'Disabled' 

              END [Job Status]

             ,CASE WHEN SJH.run_status=0 THEN 'Failed'

                         WHEN SJH.run_status=1 THEN 'Succeeded'

                         WHEN SJH.run_status=2 THEN 'Retry'

                         WHEN SJH.run_status=3 THEN 'Cancelled'

                   ELSE 'Unknown' 

              END [Job Outcome]

    FROM msdb..sysjobhistory SJH 

    JOIN msdb..sysjobs SJ  ON  SJH.job_id=sj.job_id 

    WHERE  step_id=0 

    and SJH.run_status=0

    AND    DATEADD(S, 

      (run_time/10000)*60*60 /* hours */ 

      +((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */ 

      + (run_time - (run_time/100) * 100)  /* secs */, 

      CONVERT(DATETIME,RTRIM(run_date),113)) >= DATEADD(d,-1,GetDate()) )as s

  • That does give you a count of failed jobs, provided it is a single step job, or they actually catch success/failure for each step.

    If the 'On failure action' is defaulted to 'Go to the next step' then you'll never see an error if a second or subsequent step fails.

    I have a vendor (*cough* airwatch *cough*) job that fails in step 9 of 51, but still shows overall as succeeded - because they only check the status of the last step! The standard DPA Administrative Alert for SQL Server Job Failure does actually flag it as failed, which is how I found out.

    If I change to "WHERE  step_id>=0 " in your SQL it does count the failure, but otherwise it does not.

    This also won't help in the original question to determine if a job has NOT run, as that is not a failure.

    I got caught a long time ago with a backup job that didn't fail. It was showing as running, but the scheduler had hung and we didn't get a backup for way too long.

    After that we added checks for how long since the last successful backup job ran as well.