Has anyone seen this happens? It's almost as if the device gets stuck in time. I have a different devices from switches to servers, etc that will fail snmp but still display stale data and show the device as up (Green).
Many times a manual ping of the device actually shows the device as down which basically creates a false positive in solarwinds.
I've ran a query and discovered over 500 devices in my infrastructure not polling on SNMP and all show green. When it comes to servers switching many of the Windows servers to WMI fixed the issue and those servers started to monitor normally. But Unix or Linux devices or network devices that only accept SNMP, those remained broken. I had to manually either change the community string and or restart the snmp process on the devices to fix the problem.
But it still seems like a bug to me if snmp suddenly drops and the device still shows stale data, alerts on stale data, and shows as green (Up). I've tried the basic's. Clearing sdf files. Uninstalling reinstalling collector and job engines. I even went as far as basically redeploying my whole environment but the issue remains there. So not convinced it's a collector problem.
The issue for us has been founding using the script as a workaround. I run that script once every 2 weeks find the devices not responding to snmp and fix them. But I wanted to bring up this discussion for visibility and hopefully catch the eye of a developer or someone from solarwinds who can be notified about this.
Thanks!