Hi Jeff -
1) Open "c:\Program Files (x86)\SolarWinds\Orion\HardwareHealth\SolarWinds.HardwareHealth.Pollers.dll.config" file in notepad
2) Change <add key="IgnoreCiscoNewEntityMib" value="false"/> to
<add key="IgnoreCiscoNewEntityMib" value="true"/>
This what you are after?
Did that and it broke even worse. I now have over 250 devices that are showing Overall Hardware Status Unknown
Bugger, sorry about that - it was the original fix they gave me back in ver 10.2 when I had a load of spurious hardware alerts on things I didn't even know about.
Apparently Solarwinds has changed the way they poll for hardware info. I was just told as follows:
We use small "poller" files which contain specific OIDs that are written directly in to the poller and do not reference the MIB in any way. The MIB database is only used for translating traps and providing easier set up and recognition of OIDs when using the Universal Device Poller.
So now I`m still waiting on them to get back with me directly for a Goto meeting
I have encountered several cases where there was lots of events about sensors went to Unknown status and later Up again. This issues were also related to upgrade to NPM 10.7. I can't tell if it also your case but in NPM 10.7 there is Topology turned on for all nodes which supports it. Unfortunately in some cases this causes Hardware health pollers to timeout. I would recommend running List resources on one of your nodes with Hardware health issues and see if Topology is turned ON there. If yes, try turning it OFF and see if this helps. You can also disable Topology in bulk on Manage pollers page (Settings-> Manage pollers).
We use small "poller" files which contain specific OIDs that are written directly in to the poller and do not reference the MIB in any way.
How do you do that? What small poller files.
I am very interested.
Thanks for the heads up on this. I just did a bunch of research on the Hardware Health bugs because NPM 10.6 is squawking about our new core switch, and remember seeing some threads from when this was an issue before 10.4. The best response I saw from the team was that a fix wasn't high on their priority list, so they just haven't gotten around to it. Three versions later and it's still an issue! Hopefully there's a resolution soon.
Ok so here is how Orion works, the Band-Aid solution and what I have been told is being worked on. Orion uses two mibs to do it's polling, an old Cisco Entity MIB and a newer Cisco Envronmental MIB. Orion polls for responses to any thing in the newer environmental mib and if it gets a response it uses only that. If it doesn't get a response it then polls the older Entity MIB and if it gets a response it uses that. This is done on a device by device basis if the Harware Health Sensor resource is checked under List resources. Topology has nothing to do with this. However, there is a bug in several newer devices and operating software that falsely reports problems under the newer Environmental MIB. So what you get is things like Bias errors being reported on multiple interfaces. The Band-Aid is one of two solutions:
1- Use hardware reporting as is and disable the Hardware Health Sensor resource on all those nodes that are spitting out false errors. In our network that was about 40 devices out of 2000.
2- Edit c:\Program Files (x86)\Solarwinds\Orion\hardwarehealth\Solarwinds.hardwarehealth.pollers.dll.config so that <add key="IgnoreCiscoNewEntityMib" value="false"/> is changed to true. This forces Orion to use only the older Entity mib. This means that any newer device such as Nexus devices will not be able to monitor hardware health.
Now what I have been told is future releases of NPM will have per node and sub resource filtering. Meaning that users will have the ability to filter out specific sub resource components that are giving false alerts while leaving everything else working.
Overall this is not really Solarwinds problem but they have to deal with it. There are several non-Solarwinds/non-Thwack references to the issues out on Google.
I upgraded to 10.7 and i have the same/similar problem. Cisco shop with 4500 6500 etc... they do not show power supplies any more. or other hardware health.. i still use my custom poller and it works for reporting issues though.