We seem to be having a weird issue with some nodes and I wanted to see if anyone else has seen this issue. I have one master server and two additional polling engines running NPM 10.4. At a few remote sites we've seen that a number of nodes will go into a down state and while they cannot be pinged by their assigned polling engines, they can be pinged by a different polling engine. We see the same behavior if, on that polling engine, we open a command prompt and run the ping command. Usually this happens after a power outage or a router outage. It's not always the same polling engine either, a few times we've seen it where a handful of nodes that were on two different polling engine did the same thing and all we did was swap them.
This sounds like you have some type of active blocking, like an intrusion prevention system, which may be identifying traffic from the poller as malicious and blocking it.
If you have the same problem pinging from the command line, the issue isn't with the poller. I would try your normal troubleshooting steps (traceroutes from the pollers, etc)., to see where the traffic is getting blocked, then check any ACLs on that router, especially ACLs that may get updated dynamically from an IPS.
When I've experienced this issue it's because a firewall between the pollers and remote sites is only configured to allow traffic for say Poller A and Poller B, and since that rule was configured I've stood up Poller C and assigned that remote site/node to it.
Fix: have network assign the same firewall rule(s) that exist for the older pollers to the new poller. Could also be an issue where the remote router has access controls in place, fix would be the same from the network folks.
It's not a firewall issue. We generally don't have firewalls between the remote sites and the Solarwinds servers so in these cases they shouldn't be interfering. The Firewall Rule for Solarwinds also includes all three polling engines.
If you can't ping the end node from the command line of the polling server, it's not the Solarwinds software. The traffic is being either blocked or incorrectly routed in at least one direction. You will need to trace the traffic to see where it is dying, and check the ACLs and routing tables of the intermediate devices to determine why.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.