Hello,
Has anyone created an alert that would monitor a specific node and alert if SolarWinds determine the Availability being less than 100%?
Thank you,
Mike
That's an interesting question, but you'd need to provide more details to what exactly you'd like to see as an end-result.
The calculation of Availability for a node really only makes sense over time. Every single time SolarWinds polls for availability on a node, the result is binary. Meaning that, in the simple example of ICMP, you either receive an echo, or you don't. And SolarWinds will record that as either 100 or 0, respectively, in the database.
So, to say: "Alert me when the availability of a node is < 100", what you're really asking for is an alert with any single ICMP packet doesn't make it back to the Polling Engine within 2,500ms of the poll (default ICMP timeout). In modern networks, that can be a pretty noisy alert!
On the other hand, if you say: "Alert me when the average availability of a node is < 100 for the last XX time", you start to get what most would consider more meaningful data as you're now going to have visibility into nodes that are showing packet loss during polling, but not at a sustained level that would trigger a node down event.
Quick Note: If you are unaware, SolarWinds will send 1 ICMP packet every 120 seconds (assuming default settings). If that packet returns, the node is considered 'Up' (green), and another packet will kick off in 120 seconds. If that original packet fails to return, the node is placed in 'Warning' (yellow) status and SolarWinds will begin "fast polling" for another 120 seconds. During the fast polling cycle, ICMP packets are sent every 10 seconds. If any of those packets return, the node is marked 'Up' again and the next poll will be in 120 seconds. However, if they all fail then the node is marked 'Down' (red) and the next poll will be in 120 seconds. Therefore it's entirely possible that a network could be experiencing some weirdness causing packet loss (looking at you CBQoS) that isn't readily visible as nodes are never marked 'Down' since they don't hit the sustained 120+ seconds of failed packets to flip their status.
OK, so all that being said, what's the "best" way? Well, it really depends on what you'd personally like to see your product do. There's a ridiculous amount of data pulled by these products and some of us have made careers out of taking that data and manipulating it into elegant and effective solutions. But, one of the great things about SolarWinds is you don't have to get that complex either.
So, if you want an alert notification when a node stops responding to ICMP, look for a 'Node Down' alert where the trigger condition is based on Status, not availability.
If you want an alert notification when a node stops responding, but in unusual bursts that never hit a 'Down' condition, look for a 'Packet Loss' alert where the trigger is based on % packet loss, which is the metric behind availability in an obscure way.
If you want simple visibility into nodes that could be causing problems but not alerting, I'd highly recommend creating a report to show average availability metrics and have that emailed to you, or put into a NOC display. This keeps your alert on-call sleeping and removes the "boy who cried wolf" situation with your alerting tool.
That's a big wall of text, if you've made it this far, kudos Please feel free to reply with any follow-up questions that the community may be able to assist with.
PS - If you're new to alerting with the Orion Platform, I highly recommend the following:
SolarWinds Network Performance Monitor Training: Managing Existing Alerts - YouTube
There's about 4-5 videos covering alerting in that playlist (and the rest of the videos are a great resource if you're new to the product as well)