Hi Thwack,
First time poster but long time lurker. We have an environment with a primary polling agent and two additional agents. On the polling engine details page, the primary polling engine is not syncing. Interestingly enough, any nodes that are either added to or already exist on the primary polling agent are not able to complete discovery. Once the web console goes to "List Resources", it will never complete. When I look at the Discovery logs, I find the discovery kick off and complete yet it is never updated on the web console.
Also, no actions are being fired from any alerts. An example of the error that we are seeing in the logs are below:
2018-05-01 14:31:32,827 [ActionsExecutionProcessingThread] ERROR SolarWinds.Orion.Core.Alerting.Service.ActionsResolverInternal.PendingExecutionActions - Action ID: 45, ActionType: WriteToNPMEventLog, Title: NetPerfMon Event Log : NetPerMon Event Log: Group ${N=SwisEntity;M=Name} is ${N=SwisEntity;M=Status;F=Status}, Description: Log the Alert in the Network Performance Monitor Event Log, Enabled: True, Order: 1 failed. alertActiveId: 4239427 alertObjectId: 1373320. Error: ProvideFault failed, check fault information.
2018-05-01 14:31:32,858 [ActionsExecutionProcessingThread] ERROR SolarWinds.Orion.Core.Alerting.Service.ActionsResolverInternal.PendingExecutionActions - System.ServiceModel.FaultException`1[SolarWinds.Orion.Core.Common.CoreFaultContract]: ProvideFault failed, check fault information. (Fault Detail is equal to SolarWinds.Orion.Core.Common.CoreFaultContract(Unknown): System.Collections.Generic.KeyNotFoundException: Action WriteToNPMEventLog doesn't exist
at SolarWinds.Orion.Core.Actions.Runners.ActionRunner.Execute(ActionDefinition actionDefinition, ActionContextBase context)
at SolarWinds.Orion.Core.BusinessLayer.CoreBusinessLayerService.ExecuteAction(ActionDefinition actionDefinition, ActionContextBase context)
at SyncInvokeExecuteAction(Object , Object[] , Object[] )
at System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke(Object instance, Object[] inputs, Object[]& outputs)
at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage41(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage4(MessageRpc& rpc)
at System.ServiceModel.Dispatcher.Immuta...).
Our network folks complained that their daily scheduled jobs were not running either, so we when attempted to manually run them we get "An unexpected error has occurred." When I dig through the logs, I find an error very similar to the above, except it is "Action: Save to Disk" (the job is supposed to save the results to disk). Another symptom is the daily database maintenance is not automatically starting. There are no log entries indicating it started and failed; just nothing. I have to manually start the database maintenance wizard every morning to keep the database size under control.
Support has had me rebuild the Core (twice), reinstall the Job engine, and had me run some queries to clear subscriptions and recreate them. I am losing confidence in the system that SolarWinds has for Technical Support, considering we are nearly 2 months into this case with all of the above described features being inoperable. I would had hoped that at some point I would be interfacing directly with someone from the "advanced team" or "engineering" that the technicians keep referencing so we could establish some sort of continuity of knowledge on this particular case.
Thwack community ever had any of these issues and have any suggestions as to what the root cause could be? We are on NPM 12.1 (Windows 2008 R2 so unable to upgrade to NPM 12.2).