0 Replies Latest reply on Nov 18, 2015 6:18 PM by asonberg

    Nodes flipping from warning to down

    asonberg

      ORIGINAL ENVIRONMENT: NPM 11.5.2

       

      NEW ENVIRONMENT: NPM 11.5.2, NTA 4.1.1, NCM 7.4

       

      SDK 2.0.5

       

      ==========================================

       

      Using the attached PowerShell script, modeled after the CopyNodes.ps1 example that comes with the SDK, I copied all nodes/interfaces/volumes from the ORIGINAL environment to the NEW environment. This script also added all nodes into NCM in the NEW environment. ALL nodes were SNMP or ICMP, no WMI nodes exist in these environments.

       

      When I ran this script, it went without any problems and all nodes were added to the NEW environment without any immediately obvious issues. I also compared the dbo.Pollers tables in both databases to ensure that all pollers were applied as well (They were).

       

      I noticed that, in the NEW environment, there were about 250 nodes that were in a Warning or Down status. As I was investigating, I observed the overall count of Warning + Down (~250) was staying the same, but the individual counts of nodes in Warning and nodes in Down status were fluctuating at a pretty fast rate. (i.e.; we had flapping nodes)

       

      I ran a good amount of tests on latency and ICMP response from the NEW polling engine and all of these nodes were rsponding to every single ICMP packet I sent them manually. I honestly do not remember seeing a single dropped packet in all of the testing.

                      *NOTE* The OLD environment showed all nodes UP this entire time

                    

      As we were troubleshooting, we observed that the majority of these nodes were all Cisco Routers and ASA Firewalls. We then went to the actual devices and saw that SNMP was not configured to work from the NEW polling engine yet.

       

      As we updated the configuration of all the devices, they all began to poll UP on their own without any intervention from us. (No 'Poll Now' or 'Rediscover' needed)

       

      ==========================================

       

      The question going forward is: Is this a bug in the software? To my knowledge, Node status comes from ICMP response (unless you explicitly mark a node to gather its status vie SNMP or the Agent, which none of these are). If that is accurate, then these nodes should have all been Up with all of their interfaces, volumes, and hardware in an Unknown status.