15 Replies Latest reply on Oct 5, 2011 12:12 PM by nsantos

    Dependencies Feature

    nsantos

      Hello


      As per discussion with Solarwinds representative I was instructed to open discussion here regarding the way Dependancies work. 

      It seems that dependencies only really are useful for when parent resource is marked specifically as Down, if the parent resource is Unknown then the dependencies feature doesnt work properly. 

      What I was trying to accomplish with the Solarwinds rep was the following, we monitor processes via APM, we have had instances where if snmpd is down then all alerts fire for that particular host, which is understandable.  I was hoping to use dependencies to suppress APM child process monitoring for a host if parent SNMPD is down on the host.

      However, it seems that when SNMPD is not running on a monitored host the NMS does not mark it as Down, it marks it as Unknown.  I've been explained why this is however, from a monitoring perspective my opinion is that the dependancy should work on this condition as well.  Basically if parent process is not up, meaning down or unknown, then fire alert for that and suppress alerting for configured child dependencies.

        • Re: Dependencies Feature
          darryld

          re: when SNMPD is not running on a monitored host the NMS does not mark it as Down

          I think this is only true if you are using SNMP to monitor the process. Could you use WMI, or perl script (depending on OS) to monitor the SNMP process and SNMP to do the rest of the work? 

            • Re: Dependencies Feature
              nsantos

              Hello Darryl

              I took your advice and instead of monitoring SNMPD via process monitor SNMP component I set up Linux/Unix Script Monitor component which executes perl script to monitor for process.

              Using this method when I take down SNMPD on the node the NMS marks it as down instead of unknown.  I then created group name

              SNMPD_PARENT_NODENAME

              with one component added which is

              SNMPD2 ON Process Monitor SNMPD on NODENAME

              then I created second group name

              NODENAME_CHILD_PROCESSES

              with 4 components added which are the rest of the processes on that node.

              Then I created dependency with SNMPD_PARENT_NODENAME group as parent and

              NODENAME_CHILD_PROCESSES as child. 

              When I take down snmpd on the node we get advanced alerts for component down for all processes from that server, which was same result with monitoring SNMPD via SNMP except in that scenario SNMPD is marked as unknown.

              Have I done anything wrong in the configurations of the things mentioned above or need additional things to configure?

                • Re: Dependencies Feature
                  darryld

                  I would have followed the same sequence as you have

                   

                  Are the alerts being raised even though the child processes are marked unavailable or are these processes being marked unknown?

                  Is the alert trigger of the APM Component Down type?

                    • Re: Dependencies Feature
                      nsantos

                      When I stop SNMPD service (which is being monitored via SSH script in APM) the NMS marks it as down.  The other processes (which are being monitored via SNMP in APM) are marked as unknown. 

                      We get alerts for all the processes as we have an advanced alert configured to trigger when all the following apply...

                      • Node Status is not equal to Down
                      • Component Status is not equal to Up
                      • Maintenance is equal to No (This is custom field we set up in custom properties editor)

                      Is there something not correct with this setup?

                        • Re: Dependencies Feature
                          darryld

                          re: component status is not equal to up

                          normally "unknown" is not "up" so the alert will fire but I would expect the other processes to be "unreachable" based on the dependancies you described.

                            • Re: Dependencies Feature
                              nsantos

                              The SNMPD process is considered status down in the NMS as it is monitoring via script.  The other processes get marked as unknown. 

                              There was another post response that I received from mcbridea but I cant see it here in the forum.  He referred me to Tech Ref

                              http://www.solarwinds.com/documentation/Orion/docs/Groupsanddependencies.pdf

                              I've read this and noted the section on dependencies, it states regarding explicit dependencies they operate by checking the status of a parent object defined, if the parent object status is unreachable or down the child is set to unreachable.  However, this doesnt seem to be happening.  I am wondering if I have to create advanced alerts for groups/dependancies or is the NMS aware of Advanced Alerts and takes those into consideration. 

                              During my troubleshooting with Solarwinds rep I came to understand that the Dependency Parent must be down and not unknown, thus why  I changed the APM monitoring component to ssh script method instead of snmp.  It seems that now that the parent process is alarming in the correct state it still doesn't work properly.

                                • Re: Dependencies Feature
                                  Gavin55

                                  I've had this same problem. I think I must be configuring the dependencies incorrectly.

                                    • Re: Dependencies Feature
                                      Karlo.Zatylny

                                      Hi all,

                                      In dependencies, all the parents for a given child must be Down or Unreachable (or Shutdown for an interface) in order for the child to be considered Unreachable.  A parent with status Unknown will not cause the child to go unreachable with the current implementation.

                                      Let me know if you have any other questions about dependencies behaviors.

                                        • Re: Dependencies Feature
                                          nsantos

                                          Hey Karlo,

                                          The current scenario I have is that the parent (SNMPD) is being marked down when I stop SNMPD on the test server in question, when I do this the child processes (which are monitored via SNMP) I've defined are marked as unknown and I get alerts for all process on that server instead of just for SNMPD as I had thought it would function.

                                          Originally I was using SNMP to monitor SNMPD process but this resulted in SNMPD being marked as unknown.  I then used ssh perl script to login and check status of SNMPD which now marks it as down when I am testing. 

                                            • Re: Dependencies Feature
                                              Karlo.Zatylny

                                              I think I understand now.

                                              You have set up a dependency where the parent goes down but the children go unknown instead of unreachable.  You have an alert that let's you know when you processes go unknown but it is your assumption that these processes instead of going unknown should go unreachable.

                                              If this is your assumption I can explain why this is not working.  The current dependencies code works such that if we detect that an entity (node, application, etc.) is Down, we will check the dependencies of that entity to see if the parents are down.  If all the parents are Down then this child is Unreachable.  Your case is failing in this operation because your entity is going into the Unknown status.  The dependency code looks at the Unknown status and concludes that "this entity is not in a state where I need to check its parents".  Therefore, the entity does not get marked as Unreachable.

                                              In order to make your case function, the child processes would have to mark themselves as Down when they are polled.  This is likely a change in APM (if you are using APM to do this polling).  If so, we can open up a feature request to possibly change this behavior.

                                              Let me know if I am off here.

                                                • Re: Dependencies Feature
                                                  nsantos

                                                  Hey Karlo

                                                  The only way I could make the children be marked as down is if I us SSH script monitoring for that (which marks the component "down" because the script not running results in "down" status.  In which case I guess I wouldnt need this dependancy stuff as the monitoring of the child processes wouldnt be done via SNMPD.

                                                  What I was looking at was to find out why if the parent process, in this case SNMPD, on the server goes into down state, why does it matter if the children are in a unreachable or unknown state.  I believe it should trigger if the parent is down and the children who are directly associated with it are not in UP state. 

                                                    • Re: Dependencies Feature
                                                      Karlo.Zatylny

                                                      Good point.  The dependencies code was implemented in such a way that we were treating all items generically and equally.  Meaning a node = interface = application = group.  A node in an unknown state is not equivalent to an application or component in a unknown state.  Thus the base implementation looks specifically for Down before marking the child as unreachable with logic like this:

                                                      if object status is down then
                                                          check if object is in any groups
                                                          check if object or any of its groups are in dependencies
                                                          if all parents of object are down or unreachable then
                                                              set object status to unreachable 

                                                      We did provide the ability for individual products to override this behavior.  Such that those steps could be changed to be:

                                                      if object status is down OR unknown then
                                                          check if object is in any groups
                                                          check if object or any of its groups are a child in dependencies
                                                          if all parents of object are down or unreachable then
                                                              set object status to unreachable 

                                                      This would be a specific override in APM.  I'll put in the feature request pointing at this thread as an explanation for the request.  Note that the APM team may choose to make this configurable according to your specific needs as it is possible that there is another use case where the unknown status is not a valid use case for becoming unreachable.

                                                      Thanks