9 Replies Latest reply on Feb 19, 2015 5:47 PM by cmatrask

    NPM 10.7 upgrade appears to have broken Hardware Health Monitoring

    jeffnorton

      Before I get into the meat of this I'll start off by saying I have a High support ticket opened with Solarwinds since 6AM this morning.  So my issue is that prior to the 10.7 NPM upgrade I had only 3 devices with hardware alarms.  After the upgrade I have 200 devices with hardware alarms.  Several releases back, I had this problem and Solarwinds had me run a script that changed the way/MIB that NPM used and all the upgrades up until this last upgrade haven't broken it or changed it back.  So has anyone out there in Thwack land had or have this issue and if so do you remember what the fix is?  I really really really wish that Solarwinds would put this hardware mib selection on the Admin page so it's not some esoteric function buried deep in the knowledgebase.  Oh, and before anyone asks, I have searched the knowledgebase, Thwack, as well as my old support tickets on the Customer Portal with no success.

       

      And in the midst of this I'm getting form letters from the sales staff asking me if I am interested in buying NPM.  Sales staff just can't seem to match up it's email campaigns with it's own customer database.

        • Re: NPM 10.7 upgrade appears to have broken Hardware Health Monitoring
          Goliath

          Hi Jeff -

           

          1) Open "c:\Program Files (x86)\SolarWinds\Orion\HardwareHealth\SolarWinds.HardwareHealth.Pollers.dll.config" file in notepad

          2) Change <add key="IgnoreCiscoNewEntityMib" value="false"/> to
          <add key="IgnoreCiscoNewEntityMib" value="true"/>

          This what you are after?


          Regards,

          John

          • Re: NPM 10.7 upgrade appears to have broken Hardware Health Monitoring
            nickzourdos

            Thanks for the heads up on this. I just did a bunch of research on the Hardware Health bugs because NPM 10.6 is squawking about our new core switch, and remember seeing some threads from when this was an issue before 10.4. The best response I saw from the team was that a fix wasn't high on their priority list, so they just haven't gotten around to it. Three versions later and it's still an issue! Hopefully there's a resolution soon.

            • Re: NPM 10.7 upgrade appears to have broken Hardware Health Monitoring
              jeffnorton

              Ok so here is how Orion works, the Band-Aid solution and what I have been told is being worked on.  Orion uses two mibs to do it's polling, an old Cisco Entity MIB and a newer Cisco Envronmental MIB.  Orion polls for responses to any thing in the newer environmental mib and if it gets a response it uses only that.  If it doesn't get a response it then polls the older Entity MIB and if it gets a response it uses that.  This is done on a device by device basis if the Harware Health Sensor resource is checked under List resources.  Topology has nothing to do with this.  However, there is a bug in several newer devices and operating software that falsely reports problems under the newer Environmental MIB.  So what you get is things like Bias errors being reported on multiple interfaces.  The Band-Aid is one of two solutions:

               

              1- Use hardware reporting as is and disable the Hardware Health Sensor resource on all those nodes that are spitting out false errors.  In our network that was about 40 devices out of 2000.

              2- Edit c:\Program Files (x86)\Solarwinds\Orion\hardwarehealth\Solarwinds.hardwarehealth.pollers.dll.config so that <add key="IgnoreCiscoNewEntityMib" value="false"/> is changed to true.  This forces Orion to use only the older Entity mib.  This means that any newer device such as Nexus devices will not be able to monitor hardware health.

               

              Now what I have been told is future releases of NPM will have per node and sub resource filtering.  Meaning that users will have the ability to filter out specific sub resource components that are giving false alerts while leaving everything else working.

               

              Overall this is not really Solarwinds problem but they have to deal with it.   There are several non-Solarwinds/non-Thwack references to the issues out on Google.

              • Re: NPM 10.7 upgrade appears to have broken Hardware Health Monitoring
                dpatzold1979

                I upgraded to 10.7 and i have the same/similar problem. Cisco shop with 4500 6500 etc... they do not show power supplies any more. or other hardware health.. i still use my custom poller and it works for reporting issues though.