3 Replies Latest reply on Jan 28, 2017 3:40 PM by foonly

    NPM Hardware Health on "Node" is Undefined

    colin.wilmer

      Hello,

       

      I've seen this question asked with regard to SAM, but not as much with NPM. A few times in the last week, the Hardware Health Status on my core switch has become "undefined." About ten minutes later, all Hardware Health is back up. Looking at the Switch logs and hardware status, I don't see any issues. ICMP and SNMP continue to work perfectly. Has anyone else run into this? I'm starting to think is has more to do with an NPM service/application. Any input is greatly appreciated.

       

      Thanks,

      Colin

        • Re: NPM Hardware Health on "Node" is Undefined
          noriryani

          I have the same error to. Already reported to support team to change the battery for an example, but they said there is no fault on the server. Yet npm still show Critical hardware health.

            • Re: NPM Hardware Health on "Node" is Undefined
              colin.wilmer

              I wanted to share the results of my investigation into this issue. I ended up opening two Solarwinds support cases and a Cisco TAC case, but we got it figured out. We ended up identifying a particular OID was causing higher than normal cpu utilization and then a timeout. Then we created an SNMP view that excluded the OID (the oid in question was not relevant to environmental monitoring). Here are the Cisco IOS commands to help troubleshoot and resolve the problem on the device:

               

              !

              ! - use to show the processes with the highest utilization

              !

              show processes cpu sorted

              !

              ! - Use to generate hex output about the process

              !

              show stacks <Process ID>

              !

              ! - increase the number of events that can be held before the queue is empty

              !

              snmp-server queue-length 1000

              !

              ! - Creating custom snmp view that includes all iso MIB/OIDs

              !

              snmp-server view <View Name> iso included

              !

              ! - Excluding the OID that causes high cpu utilization from the new custom view

              !

              snmp-server view <View Name> <OID> excluded

              !

              ! - configure the read only community to reference the new snmp view

              !

              snmp-server community <Community> view <View Name> RO

               

              We also ended up changing the SNMP timeout (2500ms to 5000ms) value in NPM, but this was before I had started working with Cisco on the problem.

               

              Cheers,

              Colin

            • Re: NPM Hardware Health on "Node" is Undefined
              foonly

              We've noticed this problem on just 1 switch.

               

              What was the bad OID that was being polled, and who was polling it - Orion?

               

              =Foon=