9 Replies Latest reply: Dec 5, 2011 9:38 AM by rscreed92 RSS

Dependencies of unmanaged parent still alerting

cmgurley

Perhaps I'm missing a setting, but I'm running NPM 10.1.1 and have setup dependencies for several hosts with the following format. However, when I unmanage the host (parent), the child group (dependency) is still alerting.

HostA Dependency
Parent: HostA
Child: HostA Children (group)
HostA Children (group): Interfaces of HostA (i.e. 4 HBAs and 8 NICs)

Am I missing something? If the host goes down, dependencies are supposed to prevent alerts on child nodes/interfaces. It seems the same should apply to unmanaging nodes, since I have the same intent--to take down a host and prevent alerting.

Assistance appreciated. Thanks!
~Chris

 
  • Re: Dependencies of unmanaged parent still alerting
    bshopp

    Just because you unmanage the parent, I assume we can still poll the children elements, so technically they are still up right?  Once you actually take that node down and we determine the node is down the children should go unreachable unless there is another path from the Orion server we can monitor them on.

    • Re: Dependencies of unmanaged parent still alerting
      cmgurley

      I'm guessing this will be a case of differing methodologies. My thinking is that if the parent is down (state: down) or expected to be down (state: unmanaged), the same child monitoring action should apply.

      In this case, I have hosts (ESX servers in this case) which have interfaces. I monitor the hosts as well as the switch interfaces to which the host connects. In order to prevent an inundation of alerts, I create a dependency for each host with the host as the parent as its switch interfaces (in a group) as the children. When I want to patch the host, I place the host in maintenance mode (unmanaged) because I expect it to be down. Since I expect the host to be down, I rely on the dependencies to suppress alerts on the host's interfaces, since the host is unmanaged. Otherwise, I have to go "unmanage" quite a few more objects (granted, not impossible, but just annoying).

      Based on the behavior and your reply, am I to understand that dependencies are only intended to address "down" scenarios and not "unmanaged" ones?

      Thanks,
      Chris

      • Re: Dependencies of unmanaged parent still alerting
        smittyman

        I've not seen a clear answer to this question and I have the same expectation.  I have multiple dependencies built to suppress alerting on child nodes.  As an example, I have an MPLS network that has about 70 remote sites with routers that are children to the distribution router(s) that provide access to the MPLS network back to the primary data center.  If the distribution routers go down unexpectedly, I don't want to get alerted on all the remote sites; dependencies work as expected in such a situation and suppress the alerts on the remote sites.  However, if I'm doing planned maintenance to the distribution routers that requires them to be out, if I unmanage them then I get alerted on all the remote routers.  It's somewhat impractical to have to go into each of the remote routers and unmanage them as well.  Also, those routers, while in a child group for the distribution routers, also are parents to the local switches at the remote sites.  If I unmanage them, then I have to go and unmanage all the remote switches as well, which turns into literally hundreds of nodes having to be unmanaged just because two distribution routers went down for a scheduled maintenance window.  So back to the original question, are dependencies only intended to address "down" scenarios and not "unmanaged" ones, and to add to the question, if so then is there a workaround to this behavior?

        Thanks.

        Mike

        • Re: Dependencies of unmanaged parent still alerting
          cmgurley

          Agreed, Mike. Unmanaging a node/interface implies that we are about to impact its service. Thus it is planned downtime. Dependencies are apparently only designed to address unplanned downtime, but in my opinion/use/experience (and apparently yours as well), it should equally apply to planned downtime.

          NPM already makes it hard enough to have regular/scheduled maintenance without using the cludgy Unmanage Utility, so it would be nice if we could make non-routine (but still planned) maintenance easier.

          If SW development needs to add a check box to make dependency unmanaging optional, fine. But it definitely needs to be there.

          From the field,

          Chris | www.bctechnet.com

          • Re: Dependencies of unmanaged parent still alerting
            bshopp

            Dependencies are meant to handle alert suppression as it's main primary use case.  So you lose a site and don't get 100 down node alerts from every item at that site.

            Unmanaged is for maintenance windows or scheduled downtime.  You can manually define or set unmanaged in web node management or if you want to set on a recurring basis you can currently use the "schedule unmanage" utility we ship with.

            So walk me through how you would expect the flow for unmanaging to be better?  Is it just moving the current little scheduler utility we have to the web?

            • Re: Dependencies of unmanaged parent still alerting
              cmgurley

              Brandon,

              In your reply, you explained the "what"/"why" behind Dependencies, and then you switched to explaining the "when" of the Unmanage action.

              What/Why: "Dependencies are meant to handle alert suppression as it's main primary use case."

              When: "Unmanaged is for maintenance windows or scheduled downtime."

              Thus, you didn't actually contrast the two. You simply described different aspects of them, which are not in contradiction.

              I believe that the "what"/"why" of Dependencies (alert suppression) is the same as that of Unmanaging. When we unmanage a device, we do so to suppress alerts. As an aside, frankly, I'd actually like it if it could keep polling during Unmanage windows just so I know when it is up/down, but somehow not have that in uptime stats (but that's beside the point and contrary to most tools out there). The point is, we unmanage to suppress alerts and we create dependencies to suppress alert.

              What I'm probably missing on your end is that those programmatic functions are probably completely separate and thus do not leverage common code/components as end users like myself might think.

              Let's sidebar the integrating of the unmanage utility for the sake of this discussion, since it is really a separate topic (which I'm happy/eager to cover elsewhere--would love an integrated function, like ipMonitor had).

              --Chris

              • Re: Dependencies of unmanaged parent still alerting
                bshopp

                Gotcha, sorry if I cross the streams :)

                Feel free to PM via thwack and we can chat offline on more.

                • Re: Dependencies of unmanaged parent still alerting
                  cmgurley

                  Will do. And sorry if that came across condescending. Trying to explain thought processes and logic in forums is challenging, to say the least, much less hoping that tone carries with it :). PM on the way...

                  ~Chris

                  • Re: Dependencies of unmanaged parent still alerting
                    rscreed92

                    This is important to me as well.  Has this been thought of again?  We have similar scenario to the afore mentioned ones.  We have facilities perform emergency light test anually in multiple buildings.  In this particular case the router is in the same building and will be going down as well.  We would like the ability to unmanage it and all access gear configured as dependant to not alert, but keep stats.  We would also like the ability to alter outages as well as schedule multiple for the same or different nodes.  Is there a way a table can be created that we define outages via web interface.  Then all alerts check the table prior to alerting.  Having the ability to add groups or single nodes would be great.  Especially multiple outages for the same node.  Having a check box to keep stats and/or having them compute in availibility would be a Christmas miracle... No matter when it became available.