8 Replies Latest reply on Feb 10, 2019 9:29 AM by sum_giais

    alert not working

    suthomas1

      Hi,

       

      We are setting up solarwinds.

      As part of tests, an alert "server down" was created with Priority - P3 assigned & device type "server".

       

      One of the test servers were then applied with priority - P3 and device type "server". Upon reboot of this server, there were no server down alerts received.

       

      Please help to solve this.

        • Re: alert not working
          Leon Adato

          Can you take a screenshot of the alert trigger page? There's a few details I'd like to see.

          • Re: alert not working
            RichardLetts

            How long did the server take to reboot?

            What is the polling cycle?

            What is the Alert Check Frequency?

            What are your alert rules?

             

            Here is a little graphic assuming that the polling period is 60 seconds (the default is 30 seconds, but I'm tired and this is graphic art and I can't figure out how to draw this at 30 second ticks.)

            The server rebooted in 3 minutes

            The alert is checked every 10 minutes

            it's using the ICMP ping poller.

            Starting the poller at 00:00 minutes

            a little after 4 minutes the server goes down for a reboot.

            At 00:05 the ICMP ping poller fails to get a ping reply. the Node status goes warning (Node Warning Event) and the poller goes into Rapid ping mode (pings every 10 seconds to 120 seconds)

            at  00:07 the ICMP ping poller fails to get a ping reply for 120 seconds and flags the node as down (Node down Event)

            A little before 00:08 the server restores

            at  00:08 the ICMP ping poller gets a ping reply for flags the node as up (Node UP event)

            at 00:11 the alert check runs, and the node is up, so no alert actions are triggered.

            (leads to questions like 'why is my alert not working')

             

            Had the alert check run between 00:07 and 00:08 the chances are you would have gotten an alert down event, assuming a properly configured alert.

            (this leads to questions like 'why does my alert work intermittantly')

             

            You might want to add a check for the node rebooting to catch this kind of event.

             

            To troubleshoot this issue, as Leon says, he needs a picture of the alert (the reboot duration, events for the node monitoring, and the polling frequency)

             

            [Edits: fixed graphic for an actual 10 minute alert check frequency; Fixed again for the fast ping period. Graphic art is not my strong point]

            1 of 1 people found this helpful
              • Re: alert not working
                RichardLetts
                SoapBox...

                In a different alert engine design the Events could bubble up to alerts immediately, but this design is inherrently glitchy (i.e. very timing sensitive) and not predictable in short time periods.

                 

                node warning events are significant in a well-run network-- they are the first indication of some packet loss in your network. if you are getting these, you ought to spend some time looking at why

                 

                If you reduce the alert check frequency to 1 minute, then you would catch all node down events with a duration of longer than 2 minutes. However, you will increase the load on the database for any expensive alert definitions (usually advanced SQL)

                 

                ...Soapbox
                1 of 1 people found this helpful
              • Re: alert not working
                Vinay BY

                Hi suthomas1,

                 

                Got you, to keep it simple, why don't you create a 'node reboot' alert rather than 'server down'? You should have one under default alerts that comes with your installation, check the trigger condition and create a similar one in your environment for this particular scenario.

                 

                What Leon and Richard are talking about are interlinked, server down alerts cater to various things on Orion that has been configured in your environment, like polling interval, your alert conditions, how frequently is your alert checking for the status etc etc

                1 of 1 people found this helpful
                  • Re: alert not working
                    suthomas1

                    Thanks all for the response.

                     

                    the server type is being reference in the node under "device type", which should trigger an alert of up/down.

                    As to using "node reboot" alert, instead of just getting reboot alert , we should be able to know when the node goes DOWN & comes back UP.

                     

                    Also , can someone help on how the "node name" & "IP address" can be displayed in the email body, based on above inputs.alert email config

                    Please help.

                  • Re: alert not working
                    RichardLetts

                    FGA: Please follow the standard litany when giving a problem report.

                     

                    What makes you think the alert is not working?

                    What is in the event log for the alert?

                    What are the alert conditions?

                     

                    You need to use the Node reboot alert if you want to know when a node reboots. if the node reboots quickly then a node down alert will not trigger for the reason I described. If that is the reason, your alert did not trigger then no amount of arguing is going to change how the product works and get the alerts working. (There are good reasons for NOT using an OR condition (i.e. node down OR node rebooted) because the clear condition will be a problem)

                     

                    iIsthe alert writing to the event log when it triggers?

                    is it that the email is not arriving?

                    1 of 1 people found this helpful
                      • Re: alert not working
                        suthomas1

                        Thanks again.

                        we wanted node up & down instead of just node reboot.

                        we will check on the frequency to determine this.

                         

                        The emails are coming but the node name

                        seems missing. As you can see in snapshot above,

                        the node name is not populated in ${Node} field.

                        Are we missing anything here for node name to be receivedc

                        in alert?

                          • Re: alert not working
                            sum_giais

                            Use the 'Insert Variable' to help you get the appropriate variable names for this but to display the node name I believe it's ${Nodename}

                             

                            Always good to double check these by using 'Insert Variable'

                             

                            Did you get the alert to trigger as you expected? If not it would be helpful to put a screenshot of your alert scope and trigger conditions in here as folks above mentioned..