When you decide to take time off of work to go on vacation or even plan to be out of office for a day, you must be thinking of a way to not get those dreaded phone calls and messages about server downtime and application issues.
False Alerts: Reasons Why You Get Them and How to Avoid Them
There are many reasons why you your system may trigger alerts more frequently than normal. According to this recent post, many admins get “spam” alerts for a number of reasons. Here are a few examples:
- Events that frequently occur such as CPU or memory utilization can trigger alerts more often than most other system components.
- You can get “spam” alerts from servers that are not in production or switches that have been discharged.
- If your polling cycles aren’t tuned to the right level of granularity, you might get a flood of alerts that will fill you in-box.
- Not properly tuning threshold levels can lead to a sudden spike in alerts.
These are valid reasons for why you receive a false or a "spam" alert. What if a false alert is triggered and you're out of office? You get the alert, you start making calls, and you get status updates from colleagues every few minutes to be sure the issue is resolved. When you come across such alerts you tend to ask yourself a few questions - why do I get hundreds of alerts on a daily basis when things are running smoothly? Why am I getting an alert in the middle of the night? How do I optimize server functionality so I'm not bothered constantly?
Here are some ways to avoid these issues:
- Set up alerts for components that you think are really going to impact your users or your business.
- Establish well-defined threshold settings—this way you can optimize the kind of alerts you receive during the day and ensure that you’re not bothered after work hours.
- Set the right dependencies to significantly lower the amount of alerts you receive.
- Define teams to look at specific alerts. This way you can forward issues to the right teams based on the severity of the alert.
- Understand baseline trends to set more realistic thresholds.
Determine What to Monitor and Why
Most admins have to monitor hundreds of servers and applications. This means you’re probably dealing with plenty of alerts. Under these circumstances you’ll have to determine a few things.
- Go over each metric and see if you really need to monitor that metric (if you have no defined response in how to react to the alert).
- Talk to your business groups and understand what the impact will be. This will give you a sense of how monitoring metrics might affect the overall business.
- You’ll know what they really care about and what they think are critical applications that need to be monitored.
Statistical Thresholds: A Better Way to Set Baseline Values
SolarWinds Server & Application Monitor (SAM) takes threshold-based alerting to a new level. One of the new features in version 6.0 is alerting based on statistical thresholds. Normally, you would have to monitor applications for several weeks in order to know what the ideal or optimum baseline is to set warning and critical thresholds. With the new Server & Application Monitor, threshold values can now be calculated and assigned automatically. Now, when I say automatically, SAM collects the data from the last 7 days (you have the option to change this setting) and determines the baseline values. You can then select your work hours, nights, and weekends. Based on the time of the day, SAM calculates the baseline data for both day and night system performance (the option to set threshold values manually is still available).
In short, statistical thresholds allows you to look at these processes:
- Applying thresholds to templates, individual component monitors, and applications.
- Understanding baseline statistics using standard deviation calculation for day and night system performance.
- Gaining statistical insights into the performance metrics and how they vary over time. Look at how stats are collected for higher and lower threshold values of each metric.
- Looking at baseline details before setting the right threshold values.
- Setting the right threshold values using the built-in baseline calculator that calculates and applies the recommended threshold values for warning and critical stages for a specific metric.
At some point, you will have to deal with “spam” alerts. And the best way to go about is to strike that balance between monitoring your application usage and setting the right threshold values. We believe with the new Server & Application Monitor, you can adjust thresholds more dynamically and keep those alerts to a minimum.
Feel free to sign-up and download SAM 6.0 release candidate now to experience all the new exciting features.