Dear SolarWinds wizards,
Has it ever happened to you where specific hosts (mostly windows, occasionally Linux) would be marked "down" after restarting the Solarwinds server? (... and where nothing helps but to restart those "down" hosts...)
We restart the SolarWinds server about once a month after patching, and since about 6 months ago, 2-3 hosts in our environment of about 200 devices (mixed Linux, Windows, Mac, network gear and other SNMP devices) are routinely marked offline ("down"). Attempting to test WMI credentials on Windows (or SNMP community string on Linux) hosts fails with "Test Failed. Test job timeout". Restarting WMI/SNMP services on the target hosts doesn't help. SNMPWalk from the SolarWinds server to the target hosts - works. (Haven't tried deleting and re-adding the node yet - would prefer to avoid it.)
The only thing that worked so far was to restart the target host.
Given this happens only after SW server restart and only to specific hosts, it's probably safe to say the culprit is either:
- loss of network connectivity to the target host, or
- a strange SolarWinds configuration issue or a bug
... yet even the loss of network connectivity feels unlikely: why would it always coincide with restarting the SolarWinds server and always go away after restarting the host?
Thanks!