This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

alert not working

suthomas1 over 5 years ago

Hi,

We are setting up solarwinds.

As part of tests, an alert "server down" was created with Priority - P3 assigned & device type "server".

One of the test servers were then applied with priority - P3 and device type "server". Upon reboot of this server, there were no server down alerts received.

Please help to solve this.

Top Replies

0 adatole over 5 years ago

Can you take a screenshot of the alert trigger page? There's a few details I'd like to see.
Cancel
Vote Up 0 Vote Down

Cancel
0 RichardLetts over 5 years ago

How long did the server take to reboot?
What is the polling cycle?
What is the Alert Check Frequency?
What are your alert rules?
Here is a little graphic assuming that the polling period is 60 seconds (the default is 30 seconds, but I'm tired and this is graphic art and I can't figure out how to draw this at 30 second ticks.)
The server rebooted in 3 minutes
The alert is checked every 10 minutes
it's using the ICMP ping poller.
Starting the poller at 00:00 minutes
a little after 4 minutes the server goes down for a reboot.
At 00:05 the ICMP ping poller fails to get a ping reply. the Node status goes warning (Node Warning Event) and the poller goes into Rapid ping mode (pings every 10 seconds to 120 seconds)
at 00:07 the ICMP ping poller fails to get a ping reply for 120 seconds and flags the node as down (Node down Event)
A little before 00:08 the server restores
at 00:08 the ICMP ping poller gets a ping reply for flags the node as up (Node UP event)
at 00:11 the alert check runs, and the node is up, so no alert actions are triggered.
(leads to questions like 'why is my alert not working')
Had the alert check run between 00:07 and 00:08 the chances are you would have gotten an alert down event, assuming a properly configured alert.
(this leads to questions like 'why does my alert work intermittantly')
You might want to add a check for the node rebooting to catch this kind of event.
To troubleshoot this issue, as Leon says, he needs a picture of the alert (the reboot duration, events for the node monitoring, and the polling frequency)
[Edits: fixed graphic for an actual 10 minute alert check frequency; Fixed again for the fast ping period. Graphic art is not my strong point]
Cancel
Vote Up +1 Vote Down

Cancel
0 RichardLetts over 5 years ago in reply to RichardLetts

SoapBox...
In a different alert engine design the Events could bubble up to alerts immediately, but this design is inherrently glitchy (i.e. very timing sensitive) and not predictable in short time periods.
node warning events are significant in a well-run network-- they are the first indication of some packet loss in your network. if you are getting these, you ought to spend some time looking at why
If you reduce the alert check frequency to 1 minute, then you would catch all node down events with a duration of longer than 2 minutes. However, you will increase the load on the database for any expensive alert definitions (usually advanced SQL)
...Soapbox
Cancel
Vote Up +1 Vote Down

Cancel
0 vinay.by over 5 years ago

Hi suthomas1,
Got you, to keep it simple, why don't you create a 'node reboot' alert rather than 'server down'? You should have one under default alerts that comes with your installation, check the trigger condition and create a similar one in your environment for this particular scenario.
What Leon and Richard are talking about are interlinked, server down alerts cater to various things on Orion that has been configured in your environment, like polling interval, your alert conditions, how frequently is your alert checking for the status etc etc
Cancel
Vote Up +2 Vote Down

Cancel
0 suthomas1 over 5 years ago in reply to vinay.by

Thanks all for the response.
the server type is being reference in the node under "device type", which should trigger an alert of up/down.
As to using "node reboot" alert, instead of just getting reboot alert , we should be able to know when the node goes DOWN & comes back UP.
Also , can someone help on how the "node name" & "IP address" can be displayed in the email body, based on above inputs.
Please help.
Cancel
Vote Up 0 Vote Down

Cancel
0 RichardLetts over 5 years ago

FGA: Please follow the standard litany when giving a problem report.
What makes you think the alert is not working?
What is in the event log for the alert?
What are the alert conditions?
You need to use the Node reboot alert if you want to know when a node reboots. if the node reboots quickly then a node down alert will not trigger for the reason I described. If that is the reason, your alert did not trigger then no amount of arguing is going to change how the product works and get the alerts working. (There are good reasons for NOT using an OR condition (i.e. node down OR node rebooted) because the clear condition will be a problem)
iIsthe alert writing to the event log when it triggers?
is it that the email is not arriving?
Cancel
Vote Up +1 Vote Down

Cancel
0 suthomas1 over 5 years ago in reply to RichardLetts

Thanks again.
we wanted node up & down instead of just node reboot.
we will check on the frequency to determine this.
The emails are coming but the node name
seems missing. As you can see in snapshot above,
the node name is not populated in ${Node} field.
Are we missing anything here for node name to be receivedc
in alert?
Cancel
Vote Up 0 Vote Down

Cancel
0 sum_giais over 5 years ago in reply to suthomas1

Use the 'Insert Variable' to help you get the appropriate variable names for this but to display the node name I believe it's ${Nodename}
Always good to double check these by using 'Insert Variable'
Did you get the alert to trigger as you expected? If not it would be helpful to put a screenshot of your alert scope and trigger conditions in here as folks above mentioned..
Cancel
Vote Up 0 Vote Down

Cancel