14 Replies Latest reply on Jul 29, 2019 1:57 PM by KaBlue

    Dependancies and Alerts

    grapgat

      Hi,

       

      I have been reading up a lot about dependencies and alerts in order to customize the alerting for our environment.  We have around 90 sites globally and would like to have some level of intelligent alerting.  So what that means is that when a site goes down (all nodes within the group) only the site should alert and not each node within the site (should work if I create a group for the site, and a group for all the nodes within the site, If I understand this correctly).  I have also tried the automatic dependency creation to flush out routes, however I am not convinced this is the right way. 

       

      I Would then also like a second level dependency, each site has 1 or 2 internet routers, if they go down then none of the child nodes should raise an alert, however this does not mean that the site is down (first level), especially when there are 2 Internet Routers, as one router may be a fail over or simply supply internet to another rack on the site.

       

      It sounds incredibly complex and I am not sure if this is even possible on SolarWinds, might have to do this with the alerts as a second level instead of dependencies.

        • Re: Dependancies and Alerts
          zackm

          As I read it, you have 2 scenarios to solve for:

           

          1. Suppress "down stream node" alerts when only the WAN link is dropped
          2. Suppress alerts to a single notification if the entire site goes offline

           

          Based on that, I would look at this:

           

          WAN Link:

          1. Create 'Child Groups' for each site, with membership including everything but the Internet routers
          2. Create manual dependencies where you place each router as the parent of the child group
            1. This means if you have 2 routers, you make 2 dependencies for the site

          This will suppress all node alerts for the site if a router goes into a 'down' status

           

          Site Offline: There's some challenges with this one

          1. If all nodes are down, that means that the Internet routers are down; so the rest of the nodes at the site would be marked "unreachable" instead of "down"; this makes creating a standard group alert hard because the "unreachable" status doesn't affect the group status as it's purpose is to suppress alerts by its very nature.
            1. I would recommend testing this theory, I am 99% sure "unreachable" doesn't impact group status, but I have been wrong before
          2. Given that the site would only show 100% down if all the Internet routers are down, the WAN Link scenario will cover all sites with 1 router; if you have sites with 2 routers, you can create 'Parent Groups' with them using the status rollup "Show Best"; then create a group status alert for those.

           

          there's multiple other ways, some more complex, some simpler, but more manual to implement; if you have more questions, let's get a discussion going here

            • Re: Dependancies and Alerts
              grapgat

              My Question today is kind of two fold.

               

              I have set up dependencies for the different sites across the globe, so that we only receive on alert when all the nodes in that group goes down.  So I created a root group for the country, then a sub group for the site within the country (some countries have more than 5 sites) and then another sub group containing all the nodes. Thus if the nodes for site 1 is down, then the alert will only raise an alert for that site (not the whole country), I have then in turn (before adding the sub groups into the main groups) created the dependencies, and then added the sub groups into their parents for better display.  We had a site down last night, no alert was triggered since it is checking for site DOWN, this site remained in warning state for more than 5 hours.  Thus no alerts raised. My Questions are as follows:

               

              1. Why would the nodes and the site remain in Warning state for that long? The theory is that they should change to down after 30 seconds (my polling settings are set to that)
              2. Does the dependency have anything to do with it, or is it that my dependency no longer works because I added them as an object into the parent group?

               

              EXAMPLE:

              Sorry can't display the node names

               

               

              Please help, this is making me brain dead. 

            • Re: Dependancies and Alerts
              borgan

              Are your alerts on the nodes themselves or on the Groups?

              • Re: Dependancies and Alerts
                superfly99

                This is a obvious one but are you sure you have the right nodes in the correct group? Solarwinds will not show something as up unless it responds to a ping. Double check the IP address for the nodes in Portsmouth group and confirm that they are in fact located in Portsmouth. Do a traceroute and compare results to that one node that did show as being down.