14 Replies Latest reply on Aug 29, 2013 6:50 PM by superfly99

    Node in warning - why?

    ttl

         We received an alert early this morning that one of our wireless radios went into Warning. However, there is no indication either in NPM nor in the device itself alertthat any of its interfaces lost any packets or had any errors. Where can I find out why this happened?  NPM doesn't give any info as to why this was triggered.

        • Re: Node in warning - why?
          stuartwhyte

          Is there any indication which alert was triggered?  Have a look on your alerts tab to see current alerts, then we can take a look at the condition and work back from there.  Some screenshots would certainly help us resolve quicker

            • Re: Node in warning - why?
              ttl

              Yup I thought of that too. A while ago I discovered that putting the ${AlertName} into the alert text was helpful for debugging.

               

                • Re: Node in warning - why?
                  stuartwhyte

                  Yep, very useful & something I tag on the bottom of all my alerts.

                   

                  Your selection criteria look sound to me. What is the status of the interface in question?

                    • Re: Node in warning - why?
                      ttl

                      It's showing up now and came back Up shortly after going into Warning. The problem is I'm being asked why it sent out an alert about being in Warning and I'm having trouble finding why that happened.

                        • Re: Node in warning - why?
                          stuartwhyte

                          Ok, just so I have this straight, does your alert trigger action (email/event log) have a line that writes interface status using a variable (${status}), meaning the interface had to be in "warning" status at the time of the alert, or is "warning" hard coded meaning that the interface could have been anything except up or unmanaged?  I'm thinking perhaps the interface went into "unknown" status, triggering the alert, but not actually in warning...

                            • Re: Node in warning - why?
                              ttl

                              It actually uses ${Status} in the Subject line, so that's apparently what NPM was reporting.

                                • Re: Node in warning - why?
                                  mr.e

                                  To my knowledge, Orion does not use the ${Status} variable that is in the Subject line. It merely uses it for displaying the status of the interface at the time the alert was generated. I concur with my fellow Thwackers who stated that Orion probably that the problem may be with the alert as it has been configured. 


                                  By the way, what does the status in the subject line reads?  This might give us an idea of what Orion is reporting.  Also, I must say..., when I saw the screenshot you provided, it made me think that Orion detected an "unknown status" briefly for the interface.  So, it alerted on the "unknown status", but very likely, by the time you got a chance to check the device in SW, its status had gone to Normal.  This seems to me the more likely scenario.

                                   

                                  While on the subject, if you might also want to check the "Do not trigger this action until condition exists for more than" condition (also from the Trigger Condition tab).  If the trigger is setup too low, it may generate false positive alerts and do nothing but waste your time and annoyance.  You (and your teammates) would need to decide what is the ideal time to wait before generating this alert.

                                   

                                  Lastly, if you have reduced the polling time, it may also lead to false positives.  We had this issue for some of our nodes.

                                   

                                  I hope this helps.

                                    • Re: Node in warning - why?
                                      ttl

                                      newkidd2 wrote:

                                       

                                      To my knowledge, Orion does not use the ${Status} variable that is in the Subject line. It merely uses it for displaying the status of the interface at the time the alert was generated. I concur with my fellow Thwackers who stated that Orion probably that the problem may be with the alert as it has been configured.

                                       

                                        Sorry, but the point of the alert is exactly to tell me what the status of the interface is at the time the alert was generated. If you look at the alert I have configured, you'll see that the criteria matches -- the interface is in WARNING, which is not Up or Unknown. I'm kind of at a loss at to what you mean by your statement -- why would you not want the alert to be the status of the interface when the alert was generated, if the criteria matches the alert?

                                       

                                      newkidd2 wrote:

                                       

                                      By the way, what does the status in the subject line reads?  This might give us an idea of what Orion is reporting.  Also, I must say..., when I saw the screenshot you provided, it made me think that Orion detected an "unknown status" briefly for the interface.  So, it alerted on the "unknown status", but very likely, by the time you got a chance to check the device in SW, its status had gone to Normal.  This seems to me the more likely scenario.

                                       

                                         The Alert subject line says:

                                       

                                      Warning : bridgewave-low-B -

                                       

                                          My question is why was it in Warning.  There is no indication anywhere that it ever went into "Unknown" status. I have received Unknown alerts before from other nodes. If this is in fact an Unknown issue, Solarwinds needs to address an issue with their alerting engine.

                                    • Re: Node in warning - why?
                                      stuartwhyte

                                      Just another thought - what does the Interface Availability resource show for the time period in question?

                                        • Re: Node in warning - why?
                                          ttl

                                          Interface Availability shows 100% for the entire time for all interfaces on the node.

                                            • Re: Node in warning - why?
                                              stuartwhyte

                                              Ok, my theory here is that for whatever reason a single, individual poll was missed for that interface.  Perhaps there was a high traffic load, the device was running high CPU, or something else that meant the response from the query "are you up" was not received.  This put the interface in warning.  The next poll got through ok, and so was reset almost immediately, before going to unknown or down.  Remember a warning status (in general) relates only to the status of the polls, not the status of the actual device.

                                               

                                              What value do you have the "Do not trigger this action until..." set to?  I always have this set to at least one minute, but mostly one minute plus two polling periods.  As mentioned above by mr.e this will take care of a single polling miss by requiring at least two complete polls before firing the alert.

                                              • Re: Node in warning - why?
                                                superfly99

                                                This may be totally wrong but could someone just have done a test and picked that interface for the test?