I'm not sure where to start, but I am seeing several strange (interrelated?) issues with my installation of Orion 9.1 SP5 on four different servers.
I have a primary, hot standby, additional web server and DB server. Each server is chocked full of memory, fast processors and far exceeds that minimum specs listed by SW in their documentation. The current monitored network is 4179 elements.
Here are the issues I am seeing:
(1) The primary poller frequently loses communication with the DB server. Right now, it is showing the last DB sync was 19 hours ago. Most of the nodes on the web site show down, which I know is not correct. the SW services are all up, but when I try to launch the System Manager and connect to the primary poller, I get the following error message (see attached screen shot).
(2) I was able to effect a failover to my standby poller, but the standby goes up and down every few minutes (engine status shows stopped). I have attached a couple of screenshots of the Monitor Polling Engines app. Both the primary and standby pollers engine status alternate between stopped and running states. Is this normal?
The System Manager on the standby poller shows no elements at all, even though the website shows the same as the primary poller website (many nodes being down).