This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Node in warning - why?

   We received an alert early this morning that one of our wireless radios went into Warning. However, there is no indication either in NPM nor in the device itself alertthat any of its interfaces lost any packets or had any errors. Where can I find out why this happened?  NPM doesn't give any info as to why this was triggered.

  • Is there any indication which alert was triggered?  Have a look on your alerts tab to see current alerts, then we can take a look at the condition and work back from there.  Some screenshots would certainly help us resolve quicker emoticons_happy.png

  • Yup I thought of that too. A while ago I discovered that putting the ${AlertName} into the alert text was helpful for debugging.

    pastedImage_0.png

  • Yep, very useful & something I tag on the bottom of all my alerts.

    Your selection criteria look sound to me. What is the status of the interface in question?

  • It's showing up now and came back Up shortly after going into Warning. The problem is I'm being asked why it sent out an alert about being in Warning and I'm having trouble finding why that happened.

  • Ok, just so I have this straight, does your alert trigger action (email/event log) have a line that writes interface status using a variable (${status}), meaning the interface had to be in "warning" status at the time of the alert, or is "warning" hard coded meaning that the interface could have been anything except up or unmanaged? I'm thinking perhaps the interface went into "unknown" status, triggering the alert, but not actually in warning...

  • It actually uses ${Status} in the Subject line, so that's apparently what NPM was reporting.

  • To my knowledge, Orion does not use the ${Status} variable that is in the Subject line. It merely uses it for displaying the status of the interface at the time the alert was generated. I concur with my fellow Thwackers who stated that Orion probably that the problem may be with the alert as it has been configured. 


    By the way, what does the status in the subject line reads?  This might give us an idea of what Orion is reporting.  Also, I must say..., when I saw the screenshot you provided, it made me think that Orion detected an "unknown status" briefly for the interface.  So, it alerted on the "unknown status", but very likely, by the time you got a chance to check the device in SW, its status had gone to Normal.  This seems to me the more likely scenario.

    While on the subject, if you might also want to check the "Do not trigger this action until condition exists for more than" condition (also from the Trigger Condition tab).  If the trigger is setup too low, it may generate false positive alerts and do nothing but waste your time and annoyance.  You (and your teammates) would need to decide what is the ideal time to wait before generating this alert.

    Lastly, if you have reduced the polling time, it may also lead to false positives.  We had this issue for some of our nodes.

    I hope this helps.

  • Just another thought - what does the Interface Availability resource show for the time period in question?

  • newkidd2 wrote:

    To my knowledge, Orion does not use the ${Status} variable that is in the Subject line. It merely uses it for displaying the status of the interface at the time the alert was generated. I concur with my fellow Thwackers who stated that Orion probably that the problem may be with the alert as it has been configured.

      Sorry, but the point of the alert is exactly to tell me what the status of the interface is at the time the alert was generated. If you look at the alert I have configured, you'll see that the criteria matches -- the interface is in WARNING, which is not Up or Unknown. I'm kind of at a loss at to what you mean by your statement -- why would you not want the alert to be the status of the interface when the alert was generated, if the criteria matches the alert?

    newkidd2 wrote:

    By the way, what does the status in the subject line reads?  This might give us an idea of what Orion is reporting.  Also, I must say..., when I saw the screenshot you provided, it made me think that Orion detected an "unknown status" briefly for the interface.  So, it alerted on the "unknown status", but very likely, by the time you got a chance to check the device in SW, its status had gone to Normal.  This seems to me the more likely scenario.

       The Alert subject line says:

    Warning : bridgewave-low-B -

        My question is why was it in Warning.  There is no indication anywhere that it ever went into "Unknown" status. I have received Unknown alerts before from other nodes. If this is in fact an Unknown issue, Solarwinds needs to address an issue with their alerting engine.

  • Interface Availability shows 100% for the entire time for all interfaces on the node.