1 of 1 people found this helpful
Sounds like your node down alert is too sensitive, change it to alert after 2 or more consecutive polls and play about with it.
is the node over a wan link maybe, is it over agent snmp or wmi
also node down is a P1 wow you must have a great team
are your apps not load balanced across multiple servers. If so I would change to p2 as would just be a dip in performance.
i know thats not a qiick fix sla’s and that .
Hope you manage to sort it nothing worst in getting engineers up so early for false readings.
3 of 3 people found this helpful
The way nodes get marked down already gives a lot of room for minor packet loss, it is probably important to understand exactly how that process works if you aren't familiar.
Solarwinds pings a node, if it gets a response the node is up and it waits until the next polling cycle to ping again. If it gets no answer to the ping the node is marked as in warning and it goes into fast polling mode where it sends a ping every 10 seconds for the duration of the warning period (120 seconds by default, set this in your polling settings). Packet loss is measured as the percent of the last 10 pings that don't get answered, so it will always be a multiple of 10 and it can possible miss small intermittent loss situations. If Orion gets any responses during that warning period it stays in warning status until packet loss stops. If you go the entire warning period with no responses then the node is marked down and could potentially trigger your alert.
So if you have an unusual amount of packet loss and want to cut back the alerting you can do a few things, extend the warning period or set your Node down alerts to only trigger if the device stays down for a few more minutes.