11 Replies Latest reply on Aug 8, 2014 4:27 PM by akhasheni

    Unmanaged behaviour

    Malcolm Stewart

      FYI. I have noticed some unexplained behaviour when a node is Unmanaged. It seems the IP address is still pinged and there is some snmp activity. 


      A down and unmanaged device fails those pings and this puts the node into the Top 10 Nodes by Packet Loss resource.
      A node that was shutdown last week (and unmanaged) still shows its last interface traffic stats under Maximum Traffic Today.


      If the unmanaged device still exists it is quietly replying to those pings, though it now appears in the Down Nodes resource.
      It still shows it's Availability Statistics, ie 100.00% Availability for the last 30 days
      It still appears to be re-discovered on schedule and Database Updated (according to Polling Details).


      I know the unmanaged devices can be filtered out of the various displays but it seems the underlying behaviour could be tidied up.


      As a further test I unmanaged an unimportant but reliable switch. 
      The Average Response Time & Packet Loss Gauges remain on the page and are active (but chart details look suspended), CPU Load & Memory Utilisation Gauges have disappeared from the page.  After a while I changed the snmp string (made it incorrect) for this device and immediately saw 3 authenticationFailure Traps appear for the device.  There were further authenticationFailure Traps which correspond with times of the scheduled (but thought to be unmanaged) rediscovery process. I corrected the snmp string (no traps this time) and remanaged the node.


      Lastly, in the Overview View, the icon to indicate the unmanaged node, blue with cross, does not appear in the legend with the other LEDs.
      Interesting that Interfaces are indicated here as unmanaged, though technically that is only an inherited condition of the Node being unmanaged.


      If an interface COULD be unmanaged (like so many have requested) or alerts and events suppressed while stats are silently collected as above, then you really would have a winner.



      cheers,
      stewie

      evaluating ver 9.1

        • Re: Unmanaged behaviour
          tdanner

          Some of this sounds buggy and some of it is as designed. Let me run down the list and classify.

          Buggy:

          • Pinging an unmanaged node.
          • Sending SNMP to an unmanaged node.
          • Not showing the unmanaged blue circle on the Overview page.
          • Authentication failure traps on the rediscovery schedule during unmanage.

          As designed:

          • Listing the node in places like availability statistics. Down time during an unmanaged period should not count against the node's availability (in fact, Orion shouldn't even know whether the node was up or down during unmanage), but it will still be listed there.
          • Showing response time and packet loss gauges. Gauges in Orion always show the last polled value. These values shouldn't be updated during unmanage, but we'll still show you what they were last time we checked.
          • Authentication failure traps at the time you change the community string to be wrong, even if the node is currently unmanaged. We try to validate the community string regardless of the node's current manage/unmanage state.
          We're definitely aware of the desire for unmanaging interfaces.

          You should probably pursue the stuff I flagged as buggy-sounding with support. Has anyone else reading this observed similar traffic with unmanaged nodes?
            • Re: Unmanaged behaviour
              Kryptic

              So let me make sure I understand you correctly:

              If I have a node that is reporting 100% loss and I unmanage it for whatever reason (maybe I knew the device was going to be down but didn't get a chance to unmanage it before the device dropped).  The device still shows in the "Top 10 Nodes by Percent Packet Loss" and this is by design?  If I unmanage a device, I don't see why it should show up there.  If I unmanage it, I'm obviously not worried about packet loss, errors ,etc.

              • Re: Unmanaged behaviour
                mhh351

                Yes. And I would stand on the side of "if it is unmanaged, don't do anything to it". If I unmanage some device, Orion should NOT go out and create activity.

                • Re: Unmanaged behaviour
                  akhasheni

                  Has anyone else reading this observed similar traffic with unmanaged nodes?

                  Yes, I have, and it's randomness coupled with persistence is driving me nu... is very exciting. We have about 15 unmanaged and offline nodes yet some are showing up and some aren't. In example below, Encoder-VM05 is one of eight identical offline unmanaged nodes, yet only Encoder-VM05 shows up in "high packet loss" alerts, others don't.

                   

                  Screen Shot 2014-08-06 at 6.24.30 PM.png

                   

                  All of them also show "average CPU load" and other stats. They've been offline (shutdown) for over a month. Their total life span was less than a week. So this is "average" of what? Solarwinds monitoring the Land of the Dead?

                  Screen Shot 2014-08-06 at 7.36.29 PM.png

                   

                  The original post was in 2008. That was six years ago, and yet the dead and unmanaged nodes are still very much alive. This is sad.

                • Re: Unmanaged behaviour

                  We are having a similar issue. I have a server that failed. We marked it unmanaged while we awaited onsite replacement. I marked it unmanaged and in Orion NPM it still shows the packet loss gauge at 100% and has left an alert in the "active alerts" section on my home page. I would think if a node was unmonitored it would clear these alerts. Was there ever a solution brought up for this?

                  • Re: Unmanaged behaviour
                    terrys1026

                    I would like functionality similar to stewie's suggestions.  When I'm doing maintenance on a device, I don't want alerts.  But I do want regular polling to still occur (or have a second switch that turns regular polling on/off).  The reason is so that while I'm doing maintenance, I can see the impact on the device of any changes that I make.