In previous weeks, I have talked about running a well managed network and about monitoring services beyond simple up/down reachability states. Now it's time to talk about coupling alerting with your detailed monitoring.
You may need to have an alert sent if an interface goes down in the data center, but you almost certainly don't want an alert if an interface goes down for a user's desktop. You don't need (or want) an alert for every event in the network. If you receive alerts for everything, it becomes difficult to find the ones that really matter in the noise. Unnecessary alerts train people to ignore all alerts, since those that represent real issues are (hopefully) few. Remember the story of the boy who cried wolf? Keep your alerts useful.
Useful alerts help you to be proactive and leverage your detailed monitoring. Alerts can help you be proactive by letting you know that a circuit is over some percentage of utilization, a backup interface has gone down, or a device is running out of disk space. These help you to better manage your network by being proactive and resolving problems before they become an outage, or at least allowing you to react more quickly to an outage. It's always nice to know when something is broken before your users, especially if they call and you can tell them you are already working on it.
What's your philosophy on alerts? What proactive alerts have helped you head off a problem before it became an outage?
I'm a network engineer working primarily with R&S and Wi-Fi. You can follow me on twitter at @scottm32768 or on my blog at www.mostlynetworks.com. I hold multiple IT certifications, including the Solarwinds Certified Professional (SCP2532).
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community.
More than 150,000 members are here to solve problems, share technology and best practices, and directly
contribute to our product development process.
Learn more today by joining now.