This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

VMWare ESXi 5 Host Monitoring Status Bouncing

FormerMember
FormerMember

I recently enabled SNMP monitoring for all of our ESXi hosts.  Overall it has been really nice to see the additional detail this provides in addition to the Cisco UCS monitoring and VCENTER monitoring.

However, I am not sure what happened, but over the last few weeks the monitoring status of the VMWare servers cotinuously bounces between regular operational and unknown state.

Overall all the statistics still appear to be there.  I am not sure if anything is actually failing.  But the logs of my Orion server are now filled non-stop with each of the individual application monitors for each host bouncing between unknown and up state.

Is anyone else seeing this?  Is there something I can do to figure out what the issue is?  Am I polling the servers too often?

I am running latest versions of all Solarwinds and VMWare products involved and Cisco firmware.

  • I can't say I've ever seen this behavior with default polling. If you've adjusted the polling interval I'd recommend turning this back to default to see if that resolves the issue. Another possibility is latency issues if you're polling across a bandwidth contentious highly latent link. If this doesn't describe your environment and you're currently polling with default intervals I'd recommend opening a case with support so we can take a look at your diagnostics.

  • We also see this happening with our VM Host templates. We are on the latest version of everything and confident that we dont have any network bottlenecks.

  • Without a diagnostic to look through it's difficult to determine the cause and I'm not aware of any systemic issue that could be causing this. I'd recommend opening a support case so we can troubleshoot this issue properly.

  • Logged as Case #329400. Will post any findings if fixed. Thanks

  • I have observed the same behavior on a few of our ESX 4.1 hosts.  None of our ESXi 5 hosts have done this.

  • Seems like there was a Job timeout detected, this seems to have fixed it now.


    Can you do the following:

    To replace one or both job engine databases:

    1.     Log on to your Orion server using an account with administrative rights.

    2.     Click Start > All Programs > SolarWinds Orion > Advanced Features > Orion Service Manager.

    3.     Click Shutdown Everything.
    Note: It may take a few minutes to stop all services.

    1.     If you are replacing the job engine version 2 database, complete the following steps:

    2.     Make a backup copy of JobEngine35.sdf as JobEngine35.old.
    Notes:

    o    The default location of this file on Windows Server 2008 is C:\ProgramData\SolarWinds\JobEngine.v2\Data\.

    o    The default location of this file on Windows Server 2003 is C:\Documents and Settings\All Users\Application Data\SolarWinds\JobEngine.v2\Data\.

    3.     Make a copy of JobEngine35 - Blank.sdf and rename it as JobEngine35.sdf.
    Notes:

    o    The default location of this file on Windows Server 2008 is C:\ProgramData\SolarWinds\JobEngine.v2\Data\.

    o    The default location of this file on Windows Server 2003 is C:\Documents and Settings\All Users\Application Data\SolarWinds\JobEngine.v2\Data\.

    4.     Right-click JobEngine35.sdf, as renamed in the previous step.

    5.     Click Properties.

    6.     Clear the Read-only option.

    7.     On your Orion server, click Start > All Programs > SolarWinds Orion > Advanced Features > Orion Service Manager.

    8.     Click Start Everything.
    Note: It may take a few minutes to start all services.

  • That didn't help for me (seems to have gotten worse, actually).  Guess I should open a ticket with support...