cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Hardware sensors that are for an interface, should relate to that interface, not the node.

Hardware sensors that are for an interface, should relate to that interface, not the node.

Currently Hardware sensors are only related to the Node, which is fine for some things like temperature and such.  However, for sensors that are directly related to an Interface, such as monitoring SFP's that support DOM for things like Tx/Rx power, temp, bias and such it needs to be associated with the interface and not the node.

Why you might ask?   The problem is that right now, if you have an SFP inserted in a port that is shut down, or just plain down because nothing is plugged in, the hardware sensors will ALWAYS be triggered resulting in bogus warnings.   That's because the sensor is associated with the node and the node is up.  It ignores the status of the interface!

Probably not an issue if you make sure you remove SFP's from all ports that aren't in use, but in practice that is rarely the cause.    Let's say you install a new SFP based switch and populate all the ports you know that your going to use over the next month or so.  The minute you start monitoring the switch, you will get critical alerts for all of those ports!

It just makes sense  that for those hardware components that are associate with an interface, that the alert be associated with it also.   That way, when the port is shut down, the software will disable the alert for you until it is active.

pastedImage_0.png

There have been lots of questions on this already, here are a few

Problem with hardware sensor alert...

Re: Health Sensor for a Ten Gig interface on a Cisco 4503E

Re: 10.4 hardware health monitoring - anybody know what a bias current sensor is?

Hardware health sensor is up on shutdown interface

Harware Power Sensor on Admin downed interfaces

10 Comments
Level 12

Personally I only see this issue on Arista switches...is that your finding as well?

Another major issue here, the values that are getting read from the OIDs are falling within a range that is considered critical...we need more granularity for adjusting the hardware monitoring per interface...when the SFP is inserted into the device, it activates a set of OIDs specific for that interface, then if that interface is sitting unused this issue occurs

The complexity of Network Devices and their associated Hardware monitoring in Solarwinds really needs some refinement...I have a ticket open for this currently as one of the other issues I discovered is that when I try to disable all of the DOM RX/TX hardware monitors, it is re-enabling some of them without intervention.

Another even more odd issue, only related to hardware monitoring in general, say you want to select 1000 hardware monitors with the same caption value of DOM or what have you, if you use the link that comes up when there are multiple pages of hardware components, the software has a "bug" that enables or disables every hardware component in the system.

I am currently on 12.2, but am hoping this got addressed in 12.3

I'm seeing it both on Arista and Cisco Nexus.

I don't think I've figured out quite how to see what the ranges are on the Arista although its easy to see on the Cisco's.

I've seen similar issues, but it seems when I administratively disable the unused ports containing GBIC's, and stop monitoring those ports with NPM, that the critical alerts stop.

Is that an acceptable work around?

I"m not interested in receiving interface-based alerts anymore than I am node-based alerts.  NPM isn't quite smart enough to know intuitively what I want, which means I get better performance and fewer false alerts by shutting and not monitoring unused ports containing GBIC's.

If that worked it would be an acceptable workaround, however it doesn't help.

The problem I'm talking about is that these alerts are happening for ports  that are administratively shutdown AND not monitored.    If either of those conditions stopped the alert from firing, I'd be perfectly happy.

However, the solarwinds tech explained that the reason its doing this is because the alert is tied to the node, not the interface.   So regardless of whether the interface is up or down, or whether its monitored or not, since the node is up and the alert is tied to it the alert fires.

I opened this ticket because a Solarwinds tech said to get it fixed that this would be the solution and that I should enter it as a feature request to get it on their radar to get fixed...

The only other workaround is to remove the GBIC/SFP from the port I believe, but that's not always practical.

Level 9

I have this same issue on all of my Cisco ASR routers.  SFP plugged into port but is admin down and I get all these alerts, very annoying to the point of making hardware health monitor almost useless because of all the false alerts.

Agreed--it should be acceptable to have a GBIC present but not have alerts coming from it or its node.  I don't know whether that's on Cisco's back to fix or SW's, but certainly I can see the benefit of tying hardware to an interface instead of a node.

Level 8

Same issue on all of our Nexus 9k switches. Super irritating. Seems like it should be easy to ignore this if the port is configured "admin down".

Level 10

Please! fix this.

We are being plagued by DOM Rx and Tx alerts on Arista devices interfaces that we are not even monitoring.

Level 7

We have this issue as well, it needs to be addressed. 

Level 10

So,

If you have version 11.5 and up you can disable individual power sensors on individual interfaces.

See:   https://thwack.solarwinds.com/t5/NPM-Discussions/How-to-Disable-temp-power-sensors-on-Network-Device...

Once implemented, our alert page has become useful again!