Introduction
DPA's base feature set includes alerts for SQL Server, but none are enabled by default. Almost all SQL Server environments that are using DPA could benefit from a base set of alerts this article includes. The attached zip file contains 25 alerts that can be imported into your environment. Some of them are standard DPA alerts and some are custom alerts that may be important to you (and will get added to DPA eventually).
Importing the Alerts
To import these alerts, download the attachment and copy to your DPA server. Also, rename the file give it a ".zip" file extension instead of ".sql". Within DPA, click on Options > Import Custom Entities and select the file you downloaded. The alerts as well as an alert rule for All SQL Servers will be imported. On the "Select Entities" page you can uncheck specific alerts that you do not need. On the "Review" page you will see the new alerts that will be imported, but also check the "Existing Entities" tab for any conflicts. You can choose to overwrite those conflicts if needed.
Imported Alerts
Many alerts that get imported will have a SQL Server rule attached to them. This means the alert will execute for all SQL Servers being monitored by DPA, but this can be adjusted as needed. None of the alerts have contacts tied to them, so they will not alert you until you finish the steps in the next section.
Next Steps
All alerts will need to be edited to provide the following information
- Set the name of the alert and execution interval as needed
- Optionally provide something for the Notification Text field
- Choose the instances the alert should execute against. Some of the alerts include a rule to execute against all SQL Server instances.
- Provide contacts for each threshold level
- Check the threshold definitions and adjust as needed for your environment
- Adjust other settings as needed
Included Alerts
- AG Failover
- only execute this on SQL Servers in an Availability Group, no instances are selected by default
- will evaluate to High (only threshold that is meaningful) when an AG has failed over
- AG Status
- only execute this on SQL Servers in an Availability Group, no instances are selected by default
- if the status of an AG changes, this alert will change to High (Not Healthy) or Medium (Partially Healthy)
- Backup Frequency Full, Differential, TLOG
- only applicable if using SQL Server backups
- determines the amount of time since a full, differential or transaction log backup has occurred
- default alert threshold are 7 days for full backups, 1 day for differentials and 1 hour for transaction logs
- adjust the thresholds as needed, e.g. if you perform tlog backups every 15 min, set the time units value to 15 or 30 min
- master, tempdb, model and msdb are excluded by default
- Blocking Sessions
- a custom alert that watches for blocking sessions lasting more than 30 seconds by default
- adjust and add thresholds as needed for this alert
- Database Corruption
- a custom alert that watches the msdb.suspect_pages table for event types of 1, 2, 3
- the default threshold for High is 1 page, i.e. there is at least 1 page corrupted in a database
- Database Job Failure
- SQL Server and Azure SQL MI alert that reports jobs that have failed
- Database Offline
- a custom alert that watches for databases that go offline
- if you have databases that are routinely offline, you can edit the SQL Statement and exclude them
- Error Log - Severity 17 through 25
- searches the SQL Server error log for error message with severity 17 through 25
- if the SQL Server error log is larger than 200MB, this alert will fail, so adjust as needed
- Instance Availability
- determines if a monitored database instance is available by attempting to connect to the instance from DPA
- by default, this alert will retry 3 times before triggering the alert
- the behavior is governed by 2 DPA parameters named ALERT_CONNECTION_ATTEMPTS (3) and ALERT_CONNECTION_INTERVAL (90)
- this means that when an instance goes down, it must be down for 3 * 90 = 270 seconds before you will get alerted
- it is suggested to change these values to 2 for connection attempts and 5 seconds for connection interval
- the alert will then let you know within 2 * 5 = 10 seconds
- Long Running Query
- a custom alert that looks for queries that have been executing for more than 300 seconds (default thresholds)
- Plan Change and Query Runs Longer
- a custom alert that watches for plans that have changed for a query, and the query performs worse, i.e. runs longer
- by default, if average query execution goes up by 50% (Medium) or over 100% (High) this alert will trigger
- not assigned to any instances by default
- review first 3 lines of the custom alert script for options
- SQL Agent Offline
- a custom alert that uses sys.dm_server_services to detect the status of the SQL Server Agent service
- SQL Server Deadlocks
- detects the number of deadlocks that have happened within the last 10 minutes
- alert levels are 1-3(Low), 3-6(Medium) and 6+(High), so adjust these as needed
- also note that DPA has pages that also track the number and details of deadlocks, but it's not on by default
- VMware vMotion or Power Off Events
- if you are using VMware under your database servers, this alert will let you know when a vMotion or Power Off event occurs
- Wait Time Anomaly Detection
- detects abnormally high wait times in the most recent hour
- uses a machine learning algorithm to calculate normal wait times as well as Medium and High levels
- works well against instances that have standard loads, does not work well on instances with sporadic loads like dev/test, reporting servers, data warehouse, etc
Let me know in the comments if you feel there are other standard alerts that should be included for SQL Server.