This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

How to troubleshoot NPM alerting?

FormerMember
FormerMember

I recently had several nodes go down, and didn't get any alerts from Orion. I checked the Alert Manager on the server. Nothing wrong there. I sent and received a test alert, so it isn't SMTP. I went into active alerts, and despite the node still being down, there isn't an alert for it. What else can I check?

  • Semper Fi.

    Can you take a couple screenshots of your alerts in alert manager? We'll need to know more about the rule to determine what the cause is.

    Off the top of my head, you could have a Trigger Condition that excludes all possible nodes, an Alert Suppression rule that filters out all possible nodes, or a Trigger Action that doesn't notify you properly (if you've tested using this alert and were able to receive an e-mail that kind of rules out this one). Also, your Time of Day settings may hinder this (unlikely, but hey, it's still a possibility).

    Specifically within Alert Manager, I'd verify that in the Trigger Actions include posting something to NPM's Event Log in addition to your e-mail notification.

    If you post SS's, from your "Node Down" Alert, give us your trigger condition, alert suppression and trigger actions, and confirm your Time of Day page is the default (12:00 AM to 11:59 PM with every day selected).

  • FormerMember
    0 FormerMember in reply to dhanson

    Just for reference, the last time an alert should have been triggered was for a downed APC UPS at 7:30 AM.

    Trigger Condition.png

    Alert Suppression.png

    Time of Day.png

    Alert Action.png

  • The problem is the logic of alert suppression, the alert will supress if it finds condition X, but it will suppress the WHOLE alert.

    You should delete all in alert supression and manage more conditions in the trigger like this

    Trigger alert when all of the following apply

    Node status is equal to down

    Node name is not equal to SLO-R24.srv.courts-tc.ca.gov

    Machine Type is not equal to Windows 7 workstation

  • 1. Is it possible Time Zone may have been a factor? Your time of day limitations exclude 12 hours, and if there's a significant timezone difference between the APC and the server, you might not get an alert.

    2. It could be a hang up in your alerting service. Have you received any other alerts since? If not, restart the SolarWinds Alerting Engine.

    3. Does the object(s) show as down in SolarWinds? If the object Status isn't "Down", your alert won't trigger.

    4. Why not set your Time of Day restrictions in the E-Mail/Page instead of in the Alert? According to the logic you have deployed here, even if a node dropped outside of your business hours, you won't get any information on it. But if you put the "time of day" within the e-mail notification, you could still record to Event Log when something occurs outside of business hours, just not receive an e-mail on it.

    5. Do you have a stipulation at the bottom of your trigger for "Do not trigger this action until condition exists for more than ____"? It could be that the negative condition hasn't existed long enough for the alert to trigger. For instance, if this was set to something crazy like 12 hours, this might not allow you to trigger.

    What version of NPM are you running? 11.0.1?

  • mdecima wrote:

    The problem is the logic of alert suppression, the alert will supress if it finds condition X, but it will suppress the WHOLE alert.

    You should delete all in alert supression and manage more conditions in the trigger like this

    Trigger alert when all of the following apply

    Node status is equal to down

    Node name is not equal to SLO-R24.srv.courts-tc.ca.gov

    Machine Type is not equal to Windows 7 workstation

    mdecima is correct in saying to delete everything out of alert supression. It never works and in the lastest NPM, I believe this tab has been removed for good!

    But I would modify the above trigger alert to this.

    Trigger alert when all of the following apply

    Node status is equal to down

         Trigger alert when any of the following apply

         Node name is not equal to SLO-R24.srv.courts-tc.ca.gov

         Machine Type is not equal to Windows 7 workstation