We are on version 2023.2.0
What we have noticed since upgrading to this version is an issue where suddenly interface start to drop out of polling. Then randomly devices bounce in and out of polling. Then Agents start to disconnect the monitored servers were not touched. And then we no have unreliable polling because of this.
when I check the job engine logs i see this:
2023-05-25 09:38:45,014 [29] WARN SolarWinds.JobEngine.AgentSupport.Routing.JobRouterAgentProxy - (null) Error during 'Clear scm' call. SolarWinds.JobEngine.JobEngineCommunicationException: Unable to communicate with agent JobEngine on endpoint 6fb9d5fd-4531-4c6a-96a4-55e919fc12cc. Response to message 'Clear' was not received in '00:02:00'.
This is from top to bottom of the logs with a few other random warnings in between. This is not typical. Usually I look in the job engine and it's always normal operations so to see several logs and all them say the same exact thing on all pollers indicates the job engine is the problem.
Now I've done a repair on on engine I've been working on. It didn't work.
I cleared the SDF files and had them regenerate. It didn't work.
I cleared up a whole bunch of agents no longer in use. Cleared up credential issues. etc. I did a good effort on house cleaning. This didn't help.
I'm at a point I'm considering doing a complete clean uninstall on that engine and a reinstall. But kind of skeptical if that will help if the above did not. The Agent messaging service was the service that was having issues as well and that log has a bunch of warnings and errors too.
Same with the collector. It almost seems as if the core polling components are all not working properly. Has anyone had similar issues?