My APM instance seems to very unreliable, after a reboot of the Orion server, it runs perfectly for around two weeks and then ends up in a state where on any typical day nearly half of my monitored applications are sporadically down (HTTP monitors, SQL server process monitors via SNMP, etc.) for 5 minutes at a time (i.e. one scan fails, and then the next works perfectly) .. this has been the case for nearly a year now, and APM alerts have long been ignored as completly unreliable. NPM itself shows no particular issues and I've never seen a false positive yet.
I have tried adjusting the polls per second tuning to both the recommended setting and to half the reccomended setting, however neither seem to solve the issue permanently... the only thing which will solve the problem once it occurs is a complete restart of the Orion server.
Are there any APM-specific logs in which I can look to see the cause of the issue?Or any APM tuning parameters/config files I can adjust?
I am currently running around 136 APM component monitors (licensed APM250), but this issue is really hampering my extension of APM to other applications.
NPM version is 9.1 SP4
APM version is 2.0 SP2