cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 8

Hardware Health errors

I have noticed that the hardware health functionality is radically inaccurate in our setup.  NPM is currently telling me we have eight Cisco switch power supplies that are not functioning, yet logging in to each switch and asking for a power report from it directly confirms only three of these.   All of the others say the power supply is fine, putting out 1100 Watts as expected.  In one case NPM completely missed a power supply, which we discovered was not functioning when we logged in to the switch to check on the others.  To check I removed one of the switches from Orion and re-added it, and it again reported a non-functioning power supply incorrectly.  Has anyone else encountered these errors?

0 Kudos
4 Replies
Level 13

This has been going on for quite some time.  The problem is mainly with Cisco devices and revolves around the fact that two different mibs are used depending on the hardware vintage.  Orion checks for both of these.  However, what it really boils down to is there are a host of sub criteria in these mibs that Orion is alarming on.  SOlarwinds keeps promising a filter for these in an upcoming release.  The way to deal with these offending health alarms is to uncheck the health resource in NPM under the offending device.  8Clover is correct in his statement about using env traps as an alternative

Level 9

I have seen similar issues.  I have seen NPM say that both power supplies in a device are not working.  Given that the device only has two power supplies, something is not reporting properly.  I have also seen similar issues where NPM only finds certain hardware items on a device.  In general, we don't depend on NPM for hardware health, so the bad data has gone pretty much unnoticed in our system.

0 Kudos

dragonfyre14

8clover

We have been noticing this too. Were you able to correct it or is it something that is set up incorrectly in the Current Hardware Health module?

0 Kudos

The issue still persists.  It was decided that traps from the device were the desired alerting mechanism for hardware/environmental related issues and no effort has been put into correcting NPM.  We are running 10.4.2, so an upgrade might be of benefit for us.