40 Replies Latest reply on Jun 21, 2010 11:39 PM by MagnAxiom

    Grouping of objects for alerting

      Hi there,


      We have just purchased Solarwinds Network Performance Monitor and I have be put in charge of setting it all up.


      I have added all the nodes that I would like to monitor and can see how to group them via the AgentPort, community Contact etc. However when it comes to Alerting I am struggeling to work out how to get it to group correctly.


      As an example, we have devices in many different countries. They all have the same community string, contact details etc. However we have multiple Departments that need to be alerted when their device goes offline. Our naming structure is xxxyyzz999 where xxx is the company code, yy is Site code, zz = device code, 999 = Device number. The problem is they all have the same Company code so I can't sort via name. People in Australia don't want to be notified if a device in Europe goes down, and visa vera. However they do want to know if their device is offline.


       If there a way to create a group and add devices into that group and then set alerting to that group. Eg: AU-Devices, EU-Devices, etc.


       I hope this makes some sort of sense.


       


      Regards,


      Bogor

        • Re: Grouping of objects for alerting
          Network_Guru

           Welcome to the Forums Bogor & the world that is Orion.

          Grouping is one of the main features of Orion.
          This can be accomplished with the Custom Property Editor.

          You can create any custom property you like and use this for alerting, reporting and many others.
          Using a combination of properties, it is possible to setup alerting suppression with custom properties as well.

          Example

          • create a CP called Tier and another one called site.
          • assign a value of 1 to all backbone devices, 2 for Distribution Routers & 3 for access switches & 4 for servers
          • assign a name to all devices in one site behind a BB router

          Now create an alert for all your access switches in one site such as this:

          • alert on any - node status = down and site = xyz and CP - Tier = 3
          • but not if node status <> up and site = xyz and CP - Tier = 2

          This will suppress down node alerts for your access switches in site xyz if your DR's are not up.
          Note: this last part is important as it covers the polling intervals so that your should get very few alerts through before the suppression kicks in.
          What this means is the alerts will also be suppressed if the DR status is Warning or Unknown
          You should poll your DR's & BR's more frequently than your access switches in any case.

          HTH
           

            • Re: Grouping of objects for alerting

              Many thanks for your great reply.


              That seems to be exactly what I was after.


               Regards,


              Bogor

              • Re: Grouping of objects for alerting

                Could you please attach a screen shot of how it looks in the advanced alerts?

                Your expert high-impact network management support is greatfully appreciated.

                  • Re: Grouping of objects for alerting

                    I have attached a screen shot of what I believe is the correct way to set this up.


                     Could someone please help answer if this is correct?


                     Your expert high-impact network management support is greatfully appreciated.

                    • Re: Grouping of objects for alerting

                      Pleas post a screenshot!

                      Thanks,

                       

                      AL     

                        • Re: Grouping of objects for alerting

                          Add a trigger condition!!!

                            • Re: Grouping of objects for alerting

                              The attached screen shot displays what we currently have setup and appears to be working; however, we have not tested.  The alert has not triggered yet because nothing has become unavailable.


                              If someone has confirmed that this type of setup is correct, please reply; otherwise, I will continue to test.


                              The logic here is that if something is unavailable then don't alert on child nodes that Orion does not know the status of: hense <> up status of a node.


                              Your expert high-impact network management support is greatfully appreciated.

                              • Re: Grouping of objects for alerting
                                Network_Guru

                                You are confusing triggering with alert suppression - these are 2 separate tabs which are configured independantly.
                                I have not actually set this up & tested this in production, but here is how I would do it.

                                  • Re: Grouping of objects for alerting
                                    Network_Guru

                                     One Pic per post.... doh!

                                    Here is the trigger condition: 

                                      • Re: Grouping of objects for alerting
                                        Network_Guru

                                         And here is the Suppression condition:

                                          • Re: Grouping of objects for alerting

                                            Thanks for the screen shots. 


                                            If the switch (tier 3) becomes unavailable, will a attached node to this device (tier 4) send a alert or will this be covered in a different alert.  Will a new alert will need (must) to be created for the the nodes on the site that indicates to supress the alert if the Tier is less than or equal to 3 to prevent alerts being triggered on attached nodes (tier 4)?


                                             


                                            Your expert high-impact network management support is greatfully appreciated.

                                              • Re: Grouping of objects for alerting
                                                Network_Guru

                                                This was just an example to get you started in the right direction, you will have to configure the logic yourself.
                                                It's not that difficult, especially if you draw a picture of your network (which you should have already in Mapmaker).
                                                 

                                                Example:
                                                Starting from your Orion server - the default gateway of your Orion server could be tier 1.
                                                Your near end WAN router could be tier 2
                                                Your far end WAN router for the site you are alerting on [xyz] could be tier 3.
                                                Your access switch at site xyz could be tier 4
                                                The hosts plugged into your access router could be tier 5

                                                You may have to get creative with the tier1 & 2 devices, as they won't be in the same site as xyz.
                                                Most likely you will have to nest alerts & make a suppressed alert if ANY tier 1 & 2 device down and then the ALL of the following apply supression.
                                                (alerting103.jpg)
                                                 

                                                Whatever you decide on, if you keep the same numbering scheme across your whole network, then the same alert & supression will work for all your sites &/or customers.
                                                You can copy the alert & just change the name of the site.
                                                In the case where you have many hops or devices between your Orion server & the monitored devices you could end up with a tier 15 device.
                                                You might want to multiply the numbers in the example above by 10 so that all access switches use the same tier number - 40.
                                                This gives plenty of room for numbering devices in-between.

                                                Initially it will entail some work in CPE, but should be very easy to maintain once it is all set.
                                                All I can suggest is to experiment first with some lab equipment which is monitored by Orion, until you are comfortable with how this works.

                                                DISCLAIMER - I have not set this up myself & will not be responsible if this does not work & you end up losing customers or millions of dollars.
                                                This is merely a suggestion/guideline of what could be done with Custom Properties & Advanced Alert Suppression.

                                                 

                                                Rarely do we find men who willingly engage in hard, solid thinking. There is an almost universal quest for easy answers and half-baked solutions. Nothing pains some people more than having to think. Martin Luther King, Jr.

                                  • Re: Grouping of objects for alerting

                                    Hello all

                                    I dont want to hijack this thread I just need to make sure I understand it right.

                                    so If I follow this logic...   and if I have 300 site with a router and devices behind that router I will need to create 300 alerts ?

                                    in my scenario I have this situation

                                               Router
                                              /          \
                                    Server          PC

                                     

                                    I monitor all devices UP - Down status.

                                    I have so far 3 Advance alert that monitor each devices for all my stores.  Which is working great.

                                    And I want to supress the server and pc alert when the router fails.

                                     

                                    I went ahead and created a CP giving each site their own storenumber (is assume this is what you call 'grouping')

                                    so my question is, do I need and alert for all individual store ? 

                                    There is over 300 store to monitor and if I need to create an alert for all of them, its a PITA but I wouldnt mind doing it but is this going to affect the orion performance too ?

                                    I hope I made it clear enough

                                     

                                    Thanks for your input.

                                     PS: Using Orion 9.1

                                    JF

                                      • Re: Grouping of objects for alerting
                                        Network_Guru

                                        I went ahead and created a CP giving each site their own storenumber (is assume this is what you call 'grouping')

                                        You have created 300 groups of one store each.
                                        I suggest you create one group with a common CP for all 300 stores.

                                        Using my my example of a CP called Store_tier;

                                        tier1 ----wan----> tier2 ----lan---->tier3

                                        Router1 ----wan----> Router2 ---lan--->PC/Server

                                        Router1 = head-end router = CP "1"
                                        Router2 = WAN router = CP "2"
                                        PC/Server = Store nodes = CP "3"

                                        So any PC or server at any store will be assigned the "3" Store_tier CP.

                                        Create an alert for any store:

                                        Where
                                        Store_tier = '3' AND
                                        status = down

                                        Create a suppression for this store alert:

                                        Suppress when
                                        Store_tier = or less than '2' AND
                                        status = down

                                        (I see a problem with this - ANY single "router2" down will suppress the alerts for all other stores. A second CP would have to be used to make the alert site specific)

                                        Create a second alert - All Stores down:

                                        Where
                                        Store_tier = '1' AND
                                        status = down

                                        Of course this gets more complex when you have multiple routers with with backup/redundant circuits, but the principle is the same.

                                          • Re: Grouping of objects for alerting

                                            Thanks Guru

                                            I created the alert with the suppression and it seems to work.  and the other alert for the router.

                                             

                                            (I see a problem with this - ANY single "router2" down will suppress the alerts for all other stores. A second CP would have to be used to make the alert site specific)

                                            but like you said .. it does supress the alert and I do not get the alert if a serveur fails in another store.

                                            I've been trying to play around with another CP but I dont think I'm heading in the right direction.

                                            I have in mind something like ..  " if tier2 site# is the same as tier 3 " but can it be done like that ?

                                             

                                            edit: ok so I dont think this can be done ..  I assume I will have to make an alert for each individual store.

                                        • Re: Grouping of objects for alerting

                                          What if I have 600 sites?  I would require 600 alerts?

                                          What I would like to do (I don't know if it can) is to be able to use variables in the supression tab.  If the device that went down has Tier X and Site ABC (where X and ABC could be the variables) we could lookup in the suppression tab something like if Tier is higher and Site is the same then suppress...

                                          Does this makes sense?

                                            • Re: Grouping of objects for alerting
                                              Network_Guru

                                              I suggest you experiment with alerting using lab equipment or setup some devices at your desk.

                                              Add them to monitoring and your Custom Properties.
                                              Then unplug them from the network, in order, to test  your suppression logic and or variables.

                                                • Re: Grouping of objects for alerting

                                                  I'm asking if I have 600 sites do I have to configure 600 alerts?

                                                    • Re: Grouping of objects for alerting

                                                      I'm asking if I have 600 sites do I have to configure 600 alerts?

                                                      that is what I did on my side.  I could not get any other way to make it work as we needed.

                                                      It is time consuming at first but its working great.

                                                        • Re: Grouping of objects for alerting
                                                          Network_Guru

                                                          Not sure why you need 600 alerts?
                                                          I use one alert for all 600 remote sites, but at each site I'm only monitoring 3 devices, in which case I don't need suppression turned on for the far end sites.
                                                          Suppression is only enabled if the head-end routers are both down.

                                                            • Re: Grouping of objects for alerting

                                                              In my case I have a router a server and couple of register being monitored at each site.

                                                              if the router go down I dont want an alert for the server being down ... 

                                                              and this scenario cannot be done with the tier_x method since you need to put in a custom properties for each site.

                                                                • Re: Grouping of objects for alerting

                                                                  Same scenario for me here.  Network Guru I understand how you handled your alert suppression when your "main routers" (where I suspect all your remotes sites connects to) goes down, you will not receive alert for devices behind it.

                                                                  In my case, as MaverucK5L is also picturing, I have 600 "main routers" and behind each I have 4 or 5 switches/ups I manages.  When I loose the router that connects this site to the WAN, I don't want all switches and ups for this site do generate an alarm.

                                                                   

                                                                  Site A, Switch A1, Switch A2, Switch A3, UPSA1

                                                                  Site B, Switch B1, Switch B2, Switch B3, Switch B4, UPSB1, UPSB2

                                                                  [...]

                                                                  Site X, Switch X1, Switch X2, SwitchX3, UPSX1....

                                                                  ...

                                                                   

                                                                  What would be great is something like a conditionnal alert suppresion base on a custom properties of the first device that went down.

                                                                  If I define Switch X1 as my Tier 1 (it's the router/switch that connects to the WAN) and all other devices of site X Tier 2, I would need to define a single alarm for each X sites.

                                                                  By if I could be able to define dynamically the X of Site X (based on the custom properties) first device to go down in the alert suppression tab I would be able to do this with a single alarm.  But I don't think we have this function in the alert suppression tab.

                                                                  It would go something like this:

                                                                  Site B, Switch B3 (Tier 2) goes down (then I wait 5 minutes to see if I'll find any other devices going down), if I also catch Site B switch B1 (Tier 1) and B2 (Tier 2) and UPSB1 (Tier 2).  Tier 1 becomes the device I will generate an alarm on, and I will suppress this:

                                                                  If Site of new device that goes down is equal to Site X of first device that went down and Tier is greater that 1 suppress...

                                                                  I don't know if my logic is clear to you and if we can or not do this in the alert suppression tab but this is what I would need to be able to do to accomplish what I want.

                                                                  What's missing from the alert suppression tab is value of device that went down.

                                                                  Does all that make sense?  Do you get what I'm trying to say?

                                                                    • Re: Grouping of objects for alerting
                                                                      Network_Guru

                                                                      I understand your dilemma.
                                                                      I could be wrong, but I think conditional alerting is not possible.
                                                                      You need a specific device or custom  property to monitor in the DB to create and alert.
                                                                      You are stuck with creating a separate alert for each site, since you need suppression set up for devices within each site.

                                                                        • Re: Grouping of objects for alerting
                                                                          Anthony_

                                                                          I must admit I don't see how even the suppression that is described here can work. I hope I am missing something simple.

                                                                          For an event to trigger an alert:

                                                                          • the event meets the criteria for the trigger
                                                                          • that same event does not meet the suppression criteria.

                                                                          The event of node x has no knowledge of any other event. It can't test whether node y is up or down. I am not sure what people are referring to when they set suppression criteria that would apply to node y. These would only apply if the node were node y, not if it it node x.

                                                                          What would work is for a SQL query to populate a custom property called Parent Node Status. Then if node x has a custom property Parent Node Status = Not Up, then the alert can be suppressed.

                                                                          I hope I am missing something and that there is a simpler way. It would be easier just to pass the whole alert to a piece of custom scripting and let that decide what to do with it.

                                                                            • Re: Grouping of objects for alerting
                                                                              Network_Guru


                                                                              What would work is for a SQL query to populate a custom property called Parent Node Status. Then if node x has a custom property Parent Node Status = Not Up, then the alert can be suppressed.

                                                                              I hope I am missing something and that there is a simpler way. It would be easier just to pass the whole alert to a piece of custom scripting and let that decide what to do with it.

                                                                               



                                                                              Hi Anthony, you are correct, this is exactly how the existing alert suppression works.
                                                                              Trigger an alert based on the status/value of property 'A' and/or 'B' and/or 'C' etc.

                                                                              Then in the "Alert Suppression" tab, suppress the alert based on the status/value of property 'X' and/or 'Y' and/or 'Z' etc.

                                                                              As you have noted, the suppression is based on the Parent node status for a site.
                                                                              Defining the parent node for each site is the labour intensive step that we are trying to simplify by using some common custom properties.

                                                                              I hope this makes sense?

                                                                                • Re: Grouping of objects for alerting
                                                                                  Anthony_

                                                                                  I can set a static value in a custom property telling me that node y is the parent of node x. But then if I said "fire an alert unless parent is node y" it will never fire.

                                                                                  I can't set a dynamic value telling me the status of node y - unless I write a SQL query and have it fire every minute or so. Is that what is meant?

                                                                                    • Re: Grouping of objects for alerting
                                                                                      Network_Guru


                                                                                      I can't set a dynamic value telling me the status of node y - unless I write a SQL query and have it fire every minute or so. Is that what is meant?

                                                                                       



                                                                                      Not true as per the NPM administrators guide:

                                                                                      Setting a Suppression for an Advanced Alert
                                                                                      You can set the specific conditions for suppressing an advanced alert using the following procedure.
                                                                                      Note: Alert Suppression is only available if you have checked Show Advanced Features in the lower left of the Edit Advanced Alert window.
                                                                                      To set conditions for advanced alert suppression:
                                                                                      1. Click Start > All Programs > SolarWinds Orion > Alerting, Reporting, and Mapping > Advanced Alert Manager.
                                                                                      2. Click View > Configure Alerts.
                                                                                      3. Click New or select an alert from the list.
                                                                                      4. Click Copy or Edit, as appropriate.
                                                                                      5. Click Alert Suppression.
                                                                                      Note: Generate suppression conditions in the text field by selecting appropriate descriptors from the linked context menus and by clicking Browse (...) on the left of the text field.
                                                                                      6. If you want to copy the condition used on the Trigger Condition tab, click Copy From Trigger.
                                                                                      7. Click the linked text to select the number of conditions that you want to apply (all, any, none, not all). For more information about linked text conditions, see "Understanding Condition Groups" on page 128.
                                                                                      8. Click Browse (...) to view the following condition options:

                                                                                      • To generate a condition based on a comparison of device states, click Add a Simple Condition.
                                                                                      • To generate a condition based on a comparison of device fields and values, click Add a Complex Condition.
                                                                                      • To further define the application of your conditions, click Add a Condition Group.
                                                                                      • To remove a selected condition, click Delete Current Condition.
                                                                                      • To change the order of your conditions, click Move Down or Move Up.

                                                                                      9. If you need an additional condition, click Add and then select the type of condition you want to add.

                                                                                      10. If you need to delete a condition, select the condition from the condition list, and then click Delete.
                                                                                      Note: Conditions may be exported for use with other alerts by clicking Export Conditions and saving as appropriate. Conditions from other alerts may be imported to the current alert by clicking Import Conditions.
                                                                                      Warning: Imported conditions automatically overwrite existing conditions.

                                                                          • Re: Grouping of objects for alerting
                                                                            MagnAxiom

                                                                            The issue I run into since SolarWinds doesn't support parent/child relationships is that I have multiple "report to" groups".

                                                                             

                                                                            If this type of device goes down at any of my sites, they want that alert to only be sent to that group, and I have around 8 different groups.

                                                                             

                                                                            If i were to deploy the tier method described here, I would have to multiply the last few tiers x 8 so that the alert is sent to the appropriate group.

                                                                             

                                                                            If only you could have custom properties that included whom you wanted alerted when the node goes down...