cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 9

AGENT MANAGEMENT SERVICE ERRORS

Jump to solution

Agents are reporting as down on 3 of my APEs, they're working fine on other APE's. 

Business layer log shows these errors:

2020-06-11 09:55:17,443 [Scheduler] ERROR SolarWinds.BusinessLayerHost.PluginManager - Child process for D:\Program Files (x86)\SolarWinds\Orion\AgentManagement\SolarWinds.AgentManagement.ServiceCore.dll not found and will be restarted.
2020-06-11 09:55:17,443 [Scheduler] INFO SolarWinds.BusinessLayerHost.PluginInstanceSeparateProcess - Initialize plugin D:\Program Files (x86)\SolarWinds\Orion\AgentManagement\SolarWinds.AgentManagement.ServiceCore.dll.config 

 

 

Most Agent Manage logs are blank with the exception or Agent Management service and Agent Management Watchdog. The service log doesnt give any erros, the watchdog log shows the error:

] WARN SolarWinds.AgentManagement.Messaging.Contract.MessagingServiceProxy - Connection to messaging service at 'net.pipe://localhost/SolarWinds/AgentManagement/Messaging' faulted. Running reconnection.
2020-06-10 09:10:08,367 [11] ERROR SolarWinds.AgentManagement.Messaging.Contract.MessagingServiceProxy - Messaging Service endpoint 'net.pipe://localhost/SolarWinds/AgentManagement/Messaging' was not found. There was no endpoint listening at net.pipe://localhost/SolarWinds/AgentManagement/Messaging that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
2020-06-10 09:10:08,368 [9] ERROR SolarWinds.AgentManagement.Messaging.Contract.MessagingServiceProxy - Error closing Messaging Service channel factory.
System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state.

 

Trying to install an agent manually on an end client an point to any of these 3 APE's gives the error:

Https connection to ((APE server name)) on port 1778 succeeded but the agent management service did not respond

 

Troubleshoot steps ive tried:

- Confirmed all plugins are there, confirmed the SolarWinds.AgentManagement.ServiceCore.dll file is there.

- Confirmed I have all c++ redistributable packages

- I repaired the core installer 2x's, repaired job engine, and collector services and re-ran config wiz (3xs now)

- Followed every KA/thwack post I could find on the matter:

     https://support.solarwinds.com/SuccessCenter/s/article/Forcing-the-Business-Layer-to-load-plugins-in...
       https://support.solarwinds.com/SuccessCenter/s/article/The-NPM-Summary-homepage-intermittently-loses...
      https://support.solarwinds.com/SuccessCenter/s/article/DPI-and-AgentManagement-BusinessLayer-issues
     https://thwack.solarwinds.com/t5/NPM-Discussions/Polling-Engine-Down-Database-Sync-not-occurring/m-p...

     https://thwack.solarwinds.com/t5/NPM-Discussions/SolarWinds-BusinessLayerHost-exe-terminated-du...

- Confirmed AV exclusions are in place

 

 

I have had a ticket open with SolarWinds and I feel like I'm running around in circles with support. Does anyone have any ideas?

 
 

 

 

0 Kudos
1 Solution

Hi thank you so much for responding, i did not get the email notifications about the responses so i completely missed them but was able to resolve my issue after FINALLY getting my case escalated, took a lot of people to get involved before they finally escalated, but here is the recap of the resolution, several KA's I listed above are sort of similar regarding modifying how plugins are loaded but they didnt indicate that i should make these modifications to the agentmanagement watch dog file.

SolarWinds support said this is a problem with large environments with lots of agents. Since just the affected polling engines were those that were in regions far away from the DB and main polling engine, the root cause is estimated to be either latency to the main polling engine and database server, or the great increase in agents we deployed increased the job weight by a lot. What was happening was the agent management plugins were restarting every 6 minutes because they could not complete the handshake with the DB and Main Polling Engine. 

Here is the resolution (do this on ALL polling engines, they said we could *try* on just the APEs with issues but would be safer to do it on al)

- Stop Orion services

- navigate to soalrwinds\orion\agentmanagement

- locate the SolarWinds.AgentManagement.Watchdog.dll file, copy this file and make a backup of this on your desktop or some other location

- open the SolarWinds.AgentManagement.Watchdog.dll.config file

- copy this line replace the loadplugins settings with these settings: 

<SolarWinds.BusinessLayerHost LoadPlugins="false" SeparateProcess="true" Support64="true" />

- save the file

- restart orion services

 

View solution in original post

5 Replies
Agents can be a PITA... unless there are good reasons to use them I advise my customers to use agentless polling. If this is not an option because you need QoE or NetPath from those devices or the Firewall guys don’t want to open ports for WMI Polling, you should be very careful with each update of Orion. Manage your agents regularly and maybe run a regular task to restart agent service maybe once a week or every other week. I will look through the infos you provided and will see if I can come up with something.
0 Kudos

thanks for the response. we have thousands of agents deployed due to security, and yeah they seem to require constant maintenance.

Agent polling is working fine on other polling engines but stopped working on these 3 APEs though agents did work at one point on these APEs.

ive tried restarting the agent service on all device via PowerShell script and through SolarWinds, I've also attempted to uninstall/reinstall agents on a few. It will work if you put them on a working polling engine, but not if you put them on any of the 3 polling engines with issues, so i know its the polling engine, and not the agent.

0 Kudos
Have you pulled the local agent logs from any of the monitored nodes?

Any chance that someone turned on a firewall or routing rule that is jamming up those pollers?
- Marc Netterfield, Github

Hi thank you so much for responding, i did not get the email notifications about the responses so i completely missed them but was able to resolve my issue after FINALLY getting my case escalated, took a lot of people to get involved before they finally escalated, but here is the recap of the resolution, several KA's I listed above are sort of similar regarding modifying how plugins are loaded but they didnt indicate that i should make these modifications to the agentmanagement watch dog file.

SolarWinds support said this is a problem with large environments with lots of agents. Since just the affected polling engines were those that were in regions far away from the DB and main polling engine, the root cause is estimated to be either latency to the main polling engine and database server, or the great increase in agents we deployed increased the job weight by a lot. What was happening was the agent management plugins were restarting every 6 minutes because they could not complete the handshake with the DB and Main Polling Engine. 

Here is the resolution (do this on ALL polling engines, they said we could *try* on just the APEs with issues but would be safer to do it on al)

- Stop Orion services

- navigate to soalrwinds\orion\agentmanagement

- locate the SolarWinds.AgentManagement.Watchdog.dll file, copy this file and make a backup of this on your desktop or some other location

- open the SolarWinds.AgentManagement.Watchdog.dll.config file

- copy this line replace the loadplugins settings with these settings: 

<SolarWinds.BusinessLayerHost LoadPlugins="false" SeparateProcess="true" Support64="true" />

- save the file

- restart orion services

 

View solution in original post

Check the DBO.Engines and DBO.OrionServers table in your Database if there are any old/false records from your rebuilds. The agent also stores the connection Info in the registry. Maybe clean that after an uninstall and reinstall with a „fresh“ registry. Building a new APE is then the next option when all agent troubleshooting failed.