Before we jump into the blog, let me introduce myself.  My name is Francois Caron and I am a new Product Manager here at SolarWinds and will be responsible for Orion NPM, NCM and IPSLA Manager going forward.  I come from a network management background in my previous life, however, one of the first things I did when I got here was to immerse myself in NPM and thwack.  Watching and learning from the posts between everyone in the community.

 

One of the new features in NPM 10.1 which has gotten a lot of questions and discussion has been Dependencies and Grouping and how do I apply these in Orion to my network.

 

With this blog, I am trying to recap the different use cases that I have seen in these discussions, i.e. what “objects” (left column) we are trying to manage and discuss some management “functions” (top row) that NPM users are trying to apply to them.

 

Each cell of the matrix presents a summary of how to approach the implementation of this function to this object, and provide pointers to more detailed information related to this use case (as opposed to re-explain what is already well explained either on Thwack or NPM’s documentation).

 

Feel free to comment and tell me about other use cases (different “objects” or “functions”), this could be a good basis for improving this blog moving forward. Also tell me about your successes (or failures!) with NPM in this area, we will use those as a basis for enhancement requests.

 

 

                                                                       
        

Physical Topology

      
        

Alert suppression*

      
        

Status propagation

      
        

Remote site reachable via an access router (single point of failure)

         

clip_image002[13]

         

Example of posting related to this case Nested Dependency Clarification.

         

 

         

 

         

 

         

 

      
        

Objective:

         
  •           
    No alert is fired on site’s devices if the access router goes down.
               

    Steps:

               
                  
    •               

      Put the remote site devices in a group.

                  
    •              
    •               

      Create a dependency between the access router and the group.

                  
    •           
               

    More:

               
                  
    • Some of you may think about the case of a topology where the access is actually a pair of redundant routers (vs a single point of failure). In this case, create a dependency between the remote site group and another group made up of the pair of redundant access routers. This is also discussed The specified item was not found..
    •              
    • Remember dependencies are 1:1 relationship only. If you need a 1 to N, then you need to create a group containing the N. This is discussed Re: Dependencies not behaving as expected.
    •              
    • More on dependencies in general Meet the Features – Orion NPM 10.1 - Dependencies 2.0 / Basic Root Cause Analysis.
    •              
    •               

      For this feature, the status “down” relates to lack of response to pings from the NPM poller.

                  
    •           
            
  •       
            

    I do not think there is a use case for propagating a status related to the access router, calculated from the status of the remote site’s devices. The access router usually has its own status.

             

    If you feel differently and have a good use case, feel free to comment.

             

     

             

     

             

     

             

     

             

     

             

     

             

     

          
     

     

                                                                           
            

    Port-Channel

          
            

    Alert suppression*

          
            

    Status propagation

          
            

    Logical link made-up of multiple identical interfaces (throughput aggregation and fault tolerance).

             

    clip_image001[14]

             

    Example of posting related to this case Re: What is the most efficient way to monitor etherchannels on NPM ?.

          
            

    I cannot see a good use case for suppressing alerts in this use case. Unless the logical Port-Channel had a state of its own sent by the device, which would make the alerts on each physical interface point less.

             

    For example, turning the Port-Channel down (admin status down) would make the interfaces down as well and the network admin wanted to suppress those alarms, considered noise.

             

    Open to your input and comments, of what you think happens in the context of your network and network devices/vendors.

          
            

    Objective:

             
                
    • Monitor the status of the Port-Channel object, based on the status of physical interfaces.
    •            
    • Port-Channel in warning state if 50% or less of the interfaces are down. Critical if more
    •         
             

    Steps:

             
                
    • Model the Port-Channel as a group that contains its 2 physical interfaces.
    •            
    • The group can be created manually or by a dynamic query if both physical interfaces have a property that has a common value.
    •            
    • Dependencies are not required for this use case.
    •         
             

    More:

                    
     

     

                                                                           
            

    Business services

          
            

    Alert suppression*

          
            

    Status propagation

          
            

    A banking service is made-up of 3 applications, each connected to a database, all running in a data center accessed via a router.

             

    clip_image001[20]

             

    This case is different from the previous cases because it involves objects that are not only network objects and it combines the need for both functions.

          
            

    Objective:

             
                
    • No alert is fired on each of the 3 applications if the database is down.
    •            
    • Applications being down is basically noise, the administrator should focus on taking the database back up.
    •         
             

    Steps:

             
                
    • Put your 3 APM applications in a group.
    •            
    • Create a dependency between the database (as an APM application too) and the group.
    •         
             

    More:

             
                
    • This is, in concept, identical to the physical topology case (above), i.e. the database plays the role of te access router and the 3 apps play the role of the remote site devices….
    •            
    • …but this is a very good opportunity to highlight the fact that APM as well, leverages dependencies in order to suppress alerts between APM objects that have a status = Down.
    •            
    • You need to be in APM V 4.0 to leverage this. `
    •            
    • This is well described in the Explicit Dependency section of Brandon’s blog Meet the Features – Orion NPM 10.1 - Dependencies 2.0 / Basic Root Cause Analysis.
    •         
          
            

    Objective:

             
                
    • Monitor the status of the Service, based on the status of each individual objects.
    •            
    • Each object going down impacts the service.
    •            
    • The number of objects going down does not make a difference.
    •            
    • But their nature does; i.e. some objects being down will turn the Service in state critical and some in state warning.
    •         
             

    Steps:

             
                
    • Model the Service as a group that contains its constituents (router, database, application 1, 2 and 3).
    •            
    • The group can be created manually or by a dynamic query.
    •         
             

    More:

             
                
    • See more about group creation Meet the Features – Dynamic Service Groups, especially at the end of the posting, describing an Exchange Service example.
    •            
    • Note that the status propagation cares about component state being “down”, whatever this means for the component.
    •            
    • For a Node this means not responding to pings while for an Application this is based on the status of its WMI status or status of its components (in this regards, Applications act a little bit like “groups” themselves).
    •         
          
     

    *More on the so-called “Alert Suppression”

     

    Remember (you will see this also explained in other blogs, e.g. Meet the Features – Orion NPM 10.1 - Dependencies 2.0 / Basic Root Cause Analysis), “alert suppressions’ is actually NOT about really suppressing alert, but rather making the system more intelligent by adding one more state to the system: Unreachable, in addition to the previously existing Up and Down states.

     

    It’s important to understand that dependencies turn dependent objects in state Unreachable (vs. Down) when the object they depend on is Down (e.g. the router which is the unique access to a remote site). In other words, Unreachable state really means: NPM or APM does not really know their state because the only access to them is Down.

     

    It just happens that alerts are sent on state=Down. So turning objects Unreachable (vs. Down) actually prevents alerts from being created in the first place, hence this common way to describe the feature.