This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Problem with alerting on nodes down.

I created a new alert to trigger an email anytime we have a cisco device down for more than one minute. The problem is, when I test the alert I get an email for ANY node that goes down even if I tell it to test on a monitored windows server instead a cisco device. I even tried setting suppression for windows nodes to prevent the trigger but that didn't work. Specific information on the alert configuration follows:

1. I create the alert by checking "Alert me when a node goes down" in the Alert Manager and clicking edit, then giving it a new alert name.

2. My Trigger conditions are set to All of the following, Node is equal to down, Vendor is equal to Cisco. At the bottom, I set Do not trigger this action until condition exists more than 1 minute.

3. On the Suppression conditions tab I first tried Vendor is not equal to Cisco. When that didn't work I set suppression conditions of ANY of the following, Vendor is equal to unknown, Vendor is equal VMWare Inc, and Vendor is equal to Windows. That covers all the Vendor types I have monitored.

4. I configured the trigger action to send an email to me and set the message to contain the variable of the node name.

So.. If use the test alert panel and select a Cisco device as the trigger node, I get an email formatted appropriately with the cisco node name in the message. If i trigger the alert by selecting a windows server it should suppress it. Instead, I still get an email which sometimes contains the node name and sometimes leaves the node name blank.

Have I misunderstood suppression? Shouldn't it prevent the email from being sent?

Thanks,
Laura

  • You've been caught by the "suppression tab" trap. It gets us all.

    http://thwack.solarwinds.com/message/204905#204905

    Upshot: Don't use the suppress tab. EVER.

    Put all your "if not" conditions in the alert trigger. Which in your case you don't need. Since you explicitly state "Vendor = Cisco" there's no need to indicate "Vendor <> Cisco".

    Finally, the test button doesn't REALLY test. It triggers the alert for the specified node/interface/disk/ham sandwich, but it uses CURRENT values. So often you won't get the right variable population since those values don't exist or don't exist correctly at the time you hit "test".

    A better test method is:

    1. ADD a line to specify a specific node. that way you don't get hammered.
    2. Change your thresholds. If you want to trigger on CPU, set it to be >= zero. If you want to test a down node, set it to "UP". etc.

    NOW your alert will trigger, with all the correct variables, etc. If you want to re-test it, just use the "CLEAR" button in the alert list.

    Hope that helps.

  • Thank you! That was very helpful. I'll redo the alert this afternoon. I also think I'll test it tomorrow before people come in to work by shutting down an unimportant switch and unimportant server rather than relying on their test panel. I've always suspected the test panel isn't doing exactly what it appears to do.

  • I agree with Leon Adato and never use suppression.  Also keep in mind that using Custom Properties can help you to trigger on the nodes you want, without getting ones you dont.  We have a lax way of naming our nodes here and I use custom properties to make things more standard and easier to filter.

    Jim

  • I find it better to "trick" Orion into triggering an alert on a device rather than turning it off directly.  Simply change the polling IP for a device to something that will fail, 1.1.1.1 or similar.  Orion will complain but allow it.  Pings will fail, alert will trigger all while teh device in unaffected.

  • Suppression is a tricky tool.... it must be done very carefully and with much consideration.  It will squash all the alerts if your are not careful... but in my case with the right attention it can be the tool that allows you to monitor extra interfaces or nodes without having to adjust your alert conditions (so you don't get the extra alerts).

    Proceed with Caution!

  • I have also been warned that suppression can be very tricky. I stay away from it. Keep in mind that the Test button only tests the trigger or reset action that you defined (email, logging, whatever). It doesn't test the logic of the alert and it doesn't care what node you test against.