Strange JMX monitor behavior. Flapping between up and unknown.

In our environment which was recently upgraded from 2020 to 2023.4.2 we are experiencing strange JMX monitoring behavior, the monitor goes from Up to Unknown after 5mins and then stays unknown for some time then goes back to up then to unknown. Monitoring was working fine before the upgrade.

Currently have a case with SW support Case # - 01583316 which is taking a long time for a resolution so I thought I would reach out to the community if anyone knows how to fix this issue?

We have deployed an additional poller, which was advised to us by support, and moved nodes over to the new poller, but this has not resolved our issue. The same behavior occurs. We are using agentless monitoring with WMI.

We have various error messages such as SolarWinds JMX Bridge service is not responding. Which we have confirmed that the service is running on the poller and the main server. Some JMX services are returning up status but not all of them, this is also intermittent, they transition to unknown after time.

Unexpected error occurred. Connection doesn't exist, it may have been closed, yet once again other JMX services are showing as up. We have used jconsole to confirm that connectivity is working from the additional poller.

If anyone has some advice it would be greatly appreciated!

Parents
  • Update:

    We found the cause of this, our firewall rule was blocking ICMP but allowing WMI for our APE.

    This leaves me with the question. Does JMX monitoring rely on a successful ping before it queries WMI?

    I commend Solarwinds support for being patient and helping us with optimizing our environment.

  • So the JMX monitors dont use ICMP or WMI.  The java application gets configured to allow JMX polling on a particular port and it sets up its own listener and the java app owners configure it for how they want it to handle authentication, could be a local user, ldap, or with an SSL cert.  So then SW uses that JMX bridge service you mentioned to handle the requests (except on linux agent based nodes where they poll jmx locally via the agent).  It's a separate thing completely than the usual WMI and ICMP polling for the rest of the server info. 

    I would wager that the bug you are encountering lives inside that service and something is causing it to be inconsistent about how it polls or processes this java data.  Is there a reason you haven't upgraded to a 2024 release yet?  Its possible that whatever issue you are seeing may be fixed by now.

  • We did attempt upgrading to 2024 release however the JMX bridge service failed to start on the MPE, so we were forced to rollback. Haven't had the chance to clone the system and then attempt the upgrade again and then get the logs as to why the service didn't want to start but will in due to if SW support engineers don't get back to us with some info how to fix this issue.

Reply
  • We did attempt upgrading to 2024 release however the JMX bridge service failed to start on the MPE, so we were forced to rollback. Haven't had the chance to clone the system and then attempt the upgrade again and then get the logs as to why the service didn't want to start but will in due to if SW support engineers don't get back to us with some info how to fix this issue.

Children
No Data