This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

NPM 10.7 upgrade appears to have broken Hardware Health Monitoring

Before I get into the meat of this I'll start off by saying I have a High support ticket opened with Solarwinds since 6AM this morning.  So my issue is that prior to the 10.7 NPM upgrade I had only 3 devices with hardware alarms.  After the upgrade I have 200 devices with hardware alarms.  Several releases back, I had this problem and Solarwinds had me run a script that changed the way/MIB that NPM used and all the upgrades up until this last upgrade haven't broken it or changed it back.  So has anyone out there in Thwack land had or have this issue and if so do you remember what the fix is?  I really really really wish that Solarwinds would put this hardware mib selection on the Admin page so it's not some esoteric function buried deep in the knowledgebase.  Oh, and before anyone asks, I have searched the knowledgebase, Thwack, as well as my old support tickets on the Customer Portal with no success.

And in the midst of this I'm getting form letters from the sales staff asking me if I am interested in buying NPM.  Sales staff just can't seem to match up it's email campaigns with it's own customer database.

  • Hi Jeff -

    1) Open "c:\Program Files (x86)\SolarWinds\Orion\HardwareHealth\SolarWinds.HardwareHealth.Pollers.dll.config" file in notepad

    2) Change <add key="IgnoreCiscoNewEntityMib" value="false"/> to
    <add key="IgnoreCiscoNewEntityMib" value="true"/>

    This what you are after?


    Regards,

    John

  • Did that and it broke even worse.  I now have over 250 devices that are showing Overall Hardware Status Unknown

  • Bugger, sorry about that - it was the original fix they gave me back in ver 10.2 when I had a load of spurious hardware alerts on things I didn't even know about.

  • Apparently Solarwinds has changed the way they poll for hardware info.  I was just told as follows:

    We use small "poller" files which contain specific OIDs that are written directly in to the poller and do not reference the MIB in any way. The MIB database is only used for translating traps and providing easier set up and recognition of OIDs when using the Universal Device Poller.


    So now I`m still waiting on them to get back with me directly for a Goto meeting

  • Hi Jeff,

    I have encountered several cases where there was lots of events about sensors went to Unknown status and later Up again. This issues were also related to upgrade to NPM 10.7. I can't tell if it also your case but in NPM 10.7 there is Topology turned on for all nodes which supports it. Unfortunately in some cases this causes Hardware health pollers to timeout. I would recommend running List resources on one of your nodes with Hardware health issues and see if Topology is turned ON there. If yes, try turning it OFF and see if this helps. You can also disable Topology in bulk on Manage pollers page (Settings-> Manage pollers).

  • Thanks for the heads up on this. I just did a bunch of research on the Hardware Health bugs because NPM 10.6 is squawking about our new core switch, and remember seeing some threads from when this was an issue before 10.4. The best response I saw from the team was that a fix wasn't high on their priority list, so they just haven't gotten around to it. Three versions later and it's still an issue! Hopefully there's a resolution soon.

  • Ok so here is how Orion works, the Band-Aid solution and what I have been told is being worked on.  Orion uses two mibs to do it's polling, an old Cisco Entity MIB and a newer Cisco Envronmental MIB.  Orion polls for responses to any thing in the newer environmental mib and if it gets a response it uses only that.  If it doesn't get a response it then polls the older Entity MIB and if it gets a response it uses that.  This is done on a device by device basis if the Harware Health Sensor resource is checked under List resources.  Topology has nothing to do with this.  However, there is a bug in several newer devices and operating software that falsely reports problems under the newer Environmental MIB.  So what you get is things like Bias errors being reported on multiple interfaces.  The Band-Aid is one of two solutions:

    1- Use hardware reporting as is and disable the Hardware Health Sensor resource on all those nodes that are spitting out false errors.  In our network that was about 40 devices out of 2000.

    2- Edit c:\Program Files (x86)\Solarwinds\Orion\hardwarehealth\Solarwinds.hardwarehealth.pollers.dll.config so that <add key="IgnoreCiscoNewEntityMib" value="false"/> is changed to true.  This forces Orion to use only the older Entity mib.  This means that any newer device such as Nexus devices will not be able to monitor hardware health.

    Now what I have been told is future releases of NPM will have per node and sub resource filtering.  Meaning that users will have the ability to filter out specific sub resource components that are giving false alerts while leaving everything else working.

    Overall this is not really Solarwinds problem but they have to deal with it.   There are several non-Solarwinds/non-Thwack references to the issues out on Google.

  • I upgraded to 10.7 and i have the same/similar problem. Cisco shop with 4500 6500 etc... they do not show power supplies any more. or other hardware health.. i still use my custom poller and it works for reporting issues though.

  • Jeff,

    You mentioned:

    We use small "poller" files which contain specific OIDs that are written directly in to the poller and do not reference the MIB in any way.


    How do you do that? What small poller files.

    I am very interested.

    Thanks much!

    Cheryl