Description
This alert will look for jobs that have failed since the last time this alert executed. The #FREQUENCY# variable provides the number of minutes between each run of the alert (specified in the Execution Interval of the alert definition). If the alert executes once every 10 minutes and is now running at 10:00 am, the alert will look for jobs that have failed since 9:50 (10 minutes ago). The email message will contain the Job Name, Step ID, Step Name and Error Message separated by colons. The 1 at the end of the alert is used to trigger the threshold in Ignite, note the 1 as the min value for the high threshold.
Alert Definition
To create the alert, click on Alerts > Manage Alerts tab and create a Custom Alert of type Custom SQL Alert - Multiple Numeric Return. Configure the alert with values similar to these:
Alert Name: Job Failures
Execution Interval: 10 minutes
Notification Text: The following SQL Server jobs have failed.
SQL Statement:
select LEFT(t2.name + ': ' + convert(varchar,t1.step_id) + ': ' + t1.step_name + ': ' + t1.message, 425) AS "JOB NAME: STEP ID: STEP NAME: MESSAGE", 1
FROM msdb..sysjobhistory t1 JOIN msdb..sysjobs t2 ON t1.job_id = t2.job_id
WHERE t1.run_status = 0
AND t1.step_id != 0
AND DATEADD(SECOND,
(run_duration/10000*3600 + (run_duration/100)%100*60 + run_duration%100) + (run_time/10000*3600 + (run_time/100)%100*60 + run_time%100),
CAST(STR(run_date, 8, 0) AS DATETIME))
>= dateadd (n,-#FREQUENCY#,current_timestamp)
Execute Against: Monitored Instance
Units: Empty
High Threshold: Min 1, Max empty
NOTE: The ALERT_NOTIFICATION_TRIGGER should be updated in advanced options to STATUS_NOT_NORMAL. For instructions on how to accomplish this, please refer to this knowledge base article. Alert Notifications - Confio Article 1539