We're experiencing a problem with an installation of NPM 11.5.3, NTA 4.1.2 and SAM 6.2.3. Following an upgrade, we've found that the primary (only) poller is down. Database updates are occurring as normal, but the Last Database Sync is increasing and the KeepAlive value in the AllEngines table has not updated since the services started.
We have looked in the Windows Event Logs and found an event which occurs each time the services start, ID 1024: "Unhandled Exception caught in Core Service Engine startup. Specified cast is not valid."
We have tried:
- Checking the time between the database and application server is the same
- Restarting services
- Restarting the server
- Running the config wizard (repairing all components)
- Running permission checker
- Tested granting "Everyone" access to SolarWinds directories to rule out permissions
- Uninstalling and reinstalling everything
- Recreating application certificates
- Forcing business layer plugins to start in different processes
We've even built a new VM and tested it with a newly created database (on the same instance as the live is running). We still got the error though.
We've checked the collation, CLR and compatibility of the database. There are no error messages in the event logs for SQL.
Has anyone seen something like this before?