8 Replies Latest reply on Sep 11, 2017 12:17 PM by RichardLetts

    Alert only if x number of nodes go down

    garystorr

      Hi,

       

      I'm trying to work out if there is a way to have an alert that will only trigger once any 4 nodes in a specific group go down. We monitor a network with 1000+ end sites and have them grouped by location. During working hours we get alerted if a single site goes down. But Out of Hours we only want to be alerted if 4 or more nodes go down in a single location.

       

      Can anyone think of a way to alert on this?

       

      Regards

        • Re: Alert only if x number of nodes go down
          jeilers

          I haven't been able to find a way to do exactly that. It looks like you can create an alert based on the rollup status of the group. Set the group status based on the status of the group members

          • Re: Alert only if x number of nodes go down
            jkrenzien

            In you alert trigger conditions enable the Complex Conditions (check box at the bottom) and then enable the "Alert can be triggered if ..."

             

             

              • Re: Alert only if x number of nodes go down
                jeilers

                I've been attempting to test the two trigger conditions I think that would work for this. Which are as follows:

                One:

                Two:

                 

                 

                but after making those trigger conditions I had questions whether they would work and set up groups in order to test this. My fear being that it would just take 4 nodes being down and not matter which group they are in( 2 nodes being down in 2 separate groups I thought may trigger this). For my testing purposes I created two groups. One group has 1 node down, the other group has 6 nodes down for testing.

                 

                Now when I go back to try and set up the trigger conditions from above I get the following setting:

                "Aggregated data must occur within the last" Field that allows XX amount of seconds or minutes both max value being 60.

                 

                Ignoring causes issues because on the summary page which you can see below doesn't have the prompt to show you whether the trigger condition returns as true. Will continue to test and figure out what I can but I can't find that setting in the 12.01

                 

                Here are the screenshots I have:

                 

                When I get to the summary of the alert it doesn't have the box in the lower right showing whether this alert will fire on anything or not. I checked for the setting by ctrl+f in the administrator guide for NPM 12.01 but it could not find anything. Haven't been able to find anything from google so far.

                 

                The box in the lower right hand corner does not appear if i check the box for "Alert can be triggered if .."

                 

                Questions I leave myself with are, what is the "Aggregated data must occur within the last " setting and why would the prompt in the lower right hand corner of the alert editing summary page be missing

                More for me to learn here. Thanks for the post and any input would be greatly appreciated. Also hope any of this helps you out garystorr.

                1 of 1 people found this helpful
                  • Re: Alert only if x number of nodes go down
                    garystorr

                    Hi Jeilers,

                     

                    My initial thoughts were to use groups but the roll up status didn't fit the criteria i needed, but i hadn't thought of using the group member status which combined with the "Alerts can be triggered if more than xx objects" would work. This still wouldn't work as i hoped though because there may be 3 sites down during the day, but if a single node went down OOH it would trigger the 4 nodes down criteria of the alert. However, adding in the Aggregate Data setting ( which i have never seen before) would then mean all 4 sites would have to go down within xx mins of each other. This sounds exactly what i need so thank you very much.

                     

                    I think the reason your test isn't showing on the summary page is because the sites down where not triggered in the last 10 mins which you have set in the Aggregated data setting.

                     

                    I will set this up and test, but think i will have to wait for a genuine outage to see if it works as intended. I think the theory you mention above works perfectly so i will let you know as and when i get to test live.

                     

                    Thanks again

                • Re: Alert only if x number of nodes go down
                  RichardLetts

                  the way to do this is a custom SWQL query

                  this will generate an alert when atleast 10 nodes are down in a group:

                   

                  WHERE  containerid IN (SELECT containerid

                                         FROM   (SELECT groups.containerid,

                                                        Count(*) AS down

                                                 FROM   orion.groups AS Groups

                                                        INNER JOIN orion.containermembersnapshots

                                                                   CMS

                                                                ON

                                                        groups.containerid = CMS.containerid

                                                        INNER JOIN orion.nodes N

                                                                ON n.nodeid = cms.entityid

                                                                   AND

                                                        CMS.entitytype = 'Orion.Nodes'

                                                                   AND n.status = 2

                                                 GROUP  BY groups.containerid

                                                 HAVING Count(*) >= 10))

                   

                  (the reset condition is also a custom query with a 'NOT' before the first IN ONLY

                  3 of 3 people found this helpful
                    • Re: Alert only if x number of nodes go down
                      garystorr

                      Hi Richard,

                       

                      Thanks for the reply. It looks like this would cover what i initially mentioned, but i assume this would trigger whenever 10 nodes were down in total? i would need this to only trigger if 10 or in my case 4 nodes went down within a specific time period. As this is a large network, there are always a handful of nodes down so this alert would trigger and probably remain active all the time. I really only need the alert to trigger if the nodes all go down together or within a few minutes of each other.

                       

                      Thanks again for the suggestion. I'm learning new things all the time.

                       

                       

                      Regards

                        • Re: Alert only if x number of nodes go down
                          RichardLetts

                          I'm not sure putting a time bound on the count of nodes in a group down is helpful, but I don't know your environment. [my environment is many hundred buildings, >400 routers , and >20 thousand switches and wireless access points]. I'm almost certainly going to put an alert like this into production, and suppress alerts on nodes in triggered groups which should help when we have major events when I get back into the office next week.

                           

                          My groups are buildings (on campus), or regions (for the state), or 'sectors' for national/international groups. I also exclude unmanaged nodes from groups' dynamic queries because the nodes may never go back into service, hence I didn't exclude them from the nodes selection criteria. NOTE: out of the box node alerts automatically exclude unmanaged nodes, but other alerts that include unmanaged nodes MAY include them...

                           

                          All of the alerts in Orion (today) ultimately get converted into a SQL/SWQL sql query, so my first step for really complex alerts not supported out of the box is to use the Database manager or SWQL query application (from the SDK) and try to write the query there.

                           

                          I gave the SWQL version of the query, so install SWQL studio on your desktop and try queries there.

                           

                          Trigger query:

                          hint: add a WHERE clause before the GROUP BY to select nodes that meet the time and any other criteria (e.g. unmanaged)

                           

                          Reset Query: don't add the time window so the triggered alert doesn't immediately clear

                           

                          (I do not know your environment, so you have to work out what your actual conditions are)

                      • Re: Alert only if x number of nodes go down
                        JaroslawLadyga

                        I'd suggest to make dynamic group with at least 4 nodes DOWN, then change status to 'Show the best status' and finally trigger alert if the group is DOWN.