We're seeing some very odd issues with NPM here. First, a bit about our environment:
Over 7600 nodes
We use 5 additional pollers as well as main poller
Server 2016 main poller:
VMware virtual server with 24 CPU, 64GB ram.
Database server: Server 2016 with SQL 2016
Since ugprading all the applications on 6/15, we've found that on 6/22 the server hung. After resetting it, many nodes began showing as down. We found that they were up if pinged from another machine. Then we realized that ping itself is timing out or not working on the main poller. If we stop the solarwinds services, ping starts working again. Trying to isolate an individual service causing the issue didn't find the culprit. It seems like it might be the job engine service, but stopping it by itself doesn't get ping working again, and we need it running regardless.