This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alert Suppression

Hello all,


I am still confused by Alert Suppression. Let me start with a simple scenario. I have the following hypothetical setup:


Orion > RouterA>RouterB>SwitchA>25 Servers connected to the switch


 


I want to monitor the devices RouterA, RouterB, SwitchA, and each of the servers connected to the switch. So that means I will add each of these 28 devices to Orion System Manager. Lets say I have just installed Orion and have not modified the canned alerts but I do have "Page me when a node goes down" and "Page me when an interface goes down" activated and working correctly. So at this point if a server goes down I get a page for the server (node) going down and the switch interface that the server was attached to goes down so I am paged again. So I know both alters are working correctly.


Now lets say Router A fails. How do I set up suppression so I don't get a page for RouterB going down, SwitchA going down, and all 25 servers going down. Do I have to create seperate alerts for all 27 devices and suppressions for all 27 devices?


To make it even more simple lets say SwitchA fails. How do I set up suppression so that I don't get an alert from all the 25 server nodes that I am monitoring. Do I have to set up an alert for each of the 25 server nodes and add a suppression that if switch 25 goes down don't page? I hope not! That would be very painful.


I can't seem to find any documentation or tutorials that shows the logic behind suppressions. Solar Winds does tell me how to create a suppression but does a very poor job at telling how this is put into action in the real world. I have not found any  posts on this forum that are answered very well either.


Solar Winds are you listening....


 


  • Clint,The main thing on which to focus is where the dependencies lie. In your scenario:All 25 servers depend on SwitchA, RouterB, and RouterA.SwitchA depends on RouterB and RouterA.RouterB depends on RouterA.

    What you will need to do is build a separate alert for each level of dependency. In your example, this would mean creating 3 alerts, each configured to suppress triggering, depending on the state of the devices on which they are dependent. For example, the suppression for a single alert that covers all 25 servers would look similar to the following. (This is just a quickie explanation, but if you need the detail of how it really looks in the alert config window, I can put that together tomorrow… Maybe…)  :)

    Suppress when:

    NodeName equal to SwitchA OR

    NodeName equal to RouterB OR

    NodeName equal to RouterA

    AND

    Node Status not equal to Up

    The reason I use "Node Status not equal to Up" instead of "Node Status equal to Down", is due to the fact that depending on where you are in the polling cycle, the routers or switch may still be in a "Warning" state when the servers are finally noted as "Down".

    If your primary concern is to suppress the individual server alerts when one of the three network infrastructure devices goes down, then this one alert will suffice. Naturally, if you want to suppress further up the line, then simply follow the same thought process.

    I hope this gives you something to go on. If not, too bad... emoticons_happy.png No, really, I'll try to come up with something more specific tomorrow.

    Vic

  • Clint, I noticed your question was marked as "answered", but if you still want me to post more detail, just let me know. emoticons_happy.png
  • Thank you.


     I have played with the alert suppressions of a "node going down" and think I have this manual process down although the product seems kind of brain dead in this area. There must be a better method to setup suppressions. You must agree that creating a new alert for each network device (Switch, Router) then adding the dependent upstream network devices to the suppressions tab is cumbersome. Maybe this can be improved by adding the features found in SW other products to the Orion discovery. The SW Tool set has port mapper and trace route so why can't "DEPENDENCIES" be added to the discovery process using these tools...  (forward that one to the SW developers).


     


    Also, Is it necessary to suppress alerts when an INTERFACE goes down? I am asking because the status of the interface is determined by an SNMP get. If the poller cannot get a response from an SNMP get the poller marks the interface as unknown and does not alert. If an upstream neighbor is completely down (power failure, catastrophic failure) then the SNMP get would never get to the downstream neighbor and the Orion would mark the interface as UNKNOWN and never fire an alert. ** So using our example, we are monitoring switch port, lets say 10, on SwitchA and Orion Reports the port UP. RouterB then dies and we lose connectivity to SwitchA. The poller cannot receive the reply from the SNMP get checking the status of SwitchA Port10 from switchA because routerB is dead. Is the "interface down" alert ever fired for SwitchA since the SNMP get was never received by the poller?


     


  • I agree that manually configuring alert suppression, especially in larger environments, is cumbersome at best. On the other hand, I also have a basic understanding of the potential complexities involved with building an intelligent dependency discovery and tracking mechanism. Maybe someday this will be included in Orion, but I wouldn't look for it any time soon. Remember, it's only relatively recently that we've begun to see some of what I would call more basic features included in the product, such as syslog / trap handling and custom mib support, and I feel like there is still a lot of work to be done in those areas to make those components more flexible and user friendly.

    In my opinion, SW has really stepped up recently and begun to add some much needed functionality, but it has also come at a price. As long as new features can be added as modules, the base product remains more accessible to a larger customer base, but the more features that are integrated into the base product, the more expensive it becomes. I'm sure this is something that's very much on the minds of the folks at SW. A truly functional event correlator, along with business impact determination and rule enforcement, is like the Holy Grail in the monitoring world, and would very likely come at a price. Something more basic might be more easily derived, but we will have to wait and see.

    As for the question of suppressing alerts for interface events in the given scenario, if your interface alert is configured to trigger when an interface goes "Down", then it should not trigger if the state of the interface is "Unknown". If you're alerting on all interface "Down" events for all interfaces along this entire path (without interface alert suppression) and suppressing alerts for devices behind RouterB, then you should only see two alerts; one stating that Router B is "Down", and one stating that the interface on RouterA that goes to RouterB is also "Down". To take it a step further, you could get even more granular by suppressing node "Down" alerts via suppression configuration on an interface, but that adds an even greater level of complexity to the mix.

    I think the trick to getting the most useful alerts from any alerting engine where suppression is possible, is to rationalize what makes the most sense for your given environment. To reduce the administration burden, target suppression only in the areas that make the most sense. Aggregation points such as core, distribution, and server farm switches and routers are likely targets. It all depends on the architecture of your particular environment and how much time and effort you're willing to spend to de-duplicate your alerts.

  • Three years ago SolarWinds support gave me a special suppression dll that allows for suppressing a node based on the status of a node specified in a custom property.  It was extremely helpful in our environment since we were monitoring 1000 routers and the devices behind them.  I was going to explain how to configure the suppression, but noticed that it has been wiped out in 8.1.  I opened a ticket with support and hopefully it will be simple to re-register the dll. 

  • Three years ago SolarWinds support gave me a special suppression dll that allows for suppressing a node based on the status of a node specified in a custom property.



    OK... If you're saying what I think you're saying, I've been trying to figure out how to do this very thing. Are you saying that this mechanism allows you to create one alert (covering all nodes) that triggers (or suppresses) based on the status of the node listed in custom property x? This would mean you only have to create one alert for each level of node dependency (not each location), and simply populate the custom property field with the appropriate parent node.

    This is something that really needs to be included in the base product, and should be quite simple to implement. Ideally, it would also be able to suppress based on the status of more than one device (router x AND router y), thus encompassing situations where redundancy is present.

    Any thoughts from the folks at SW?

  • Yes, please post the results of that ticket.  It has been cumbersome at best for us to configure alert suppressions.  This would be a huge help.

  • Yes, you could add multiple custom properties and then create a suppression for each one.  Here is an old screen shot:


    So since the alert suppression is first in the list, non of the alerts following will be executed.  If my e-mail alert was first, then it would be sent out.


    When the router listed in the custom property "StoreRouter" is not up, then alerts will be suppressed.  Hopefully SolarWinds will be able to add this feature to v8.

  • Hi. I need you help. In Orion 8.1 As I form "Advanced Alarm"...??


    Suppress when:


    NodeName equal to SwitchA OR
    NodeName equal to RouterB OR
    NodeName equal to RouterA
    AND
    Node Status not equal to Up 


    Thanks!!
  • Hi, i dont have the option of supress an alert with and action.