You can set up a custom alert such as "Custom SQL Alert - Multiple Numeric Return" using something like this:
JOBS.NAME AS [Job Name]
,DATEDIFF(minute, MAX(msdb.DBO.AGENT_DATETIME(RUN_DATE, RUN_TIME)), getdate() )
FROM msdb.dbo.SYSJOBS JOBS
LEFT OUTER JOIN msdb.dbo.SYSJOBHISTORY JH
ON JOBS.job_id = JH.job_id
WHERE JH.step_id = 0
GROUP BY JOBS.name
This just tells you the job name and how long (minutes) since the job last ran (not whether it was successful). Set the alert threshold to a suitable value to pick up jobs that haven't run in the relevant period.
This SQL is not fully tested, so please verify the results before you use it in Prod
This Query return count JOB Fail , I did create metric custom
yuo try in dev
select count(*) as JOBFail
SELECT distinct name AS [Job Name]
,CASE WHEN enabled=1 THEN 'Enabled'
END [Job Status]
,CASE WHEN SJH.run_status=0 THEN 'Failed'
WHEN SJH.run_status=1 THEN 'Succeeded'
WHEN SJH.run_status=2 THEN 'Retry'
WHEN SJH.run_status=3 THEN 'Cancelled'
END [Job Outcome]
FROM msdb..sysjobhistory SJH
JOIN msdb..sysjobs SJ ON SJH.job_id=sj.job_id
(run_time/10000)*60*60 /* hours */
+((run_time - (run_time/10000) * 10000)/100) * 60 /* mins */
+ (run_time - (run_time/100) * 100) /* secs */,
CONVERT(DATETIME,RTRIM(run_date),113)) >= DATEADD(d,-1,GetDate()) )as s
That does give you a count of failed jobs, provided it is a single step job, or they actually catch success/failure for each step.
If the 'On failure action' is defaulted to 'Go to the next step' then you'll never see an error if a second or subsequent step fails.
I have a vendor (*cough* airwatch *cough*) job that fails in step 9 of 51, but still shows overall as succeeded - because they only check the status of the last step! The standard DPA Administrative Alert for SQL Server Job Failure does actually flag it as failed, which is how I found out.
If I change to "WHERE step_id>=0 " in your SQL it does count the failure, but otherwise it does not.
This also won't help in the original question to determine if a job has NOT run, as that is not a failure.
I got caught a long time ago with a backup job that didn't fail. It was showing as running, but the scheduler had hung and we didn't get a backup for way too long.
After that we added checks for how long since the last successful backup job ran as well.