7 Replies Latest reply: Apr 16, 2012 1:28 PM by Tom Bateman RSS

VMWare ESXi 5 Host Monitoring Status Bouncing

timsilverline

I recently enabled SNMP monitoring for all of our ESXi hosts.  Overall it has been really nice to see the additional detail this provides in addition to the Cisco UCS monitoring and VCENTER monitoring.

However, I am not sure what happened, but over the last few weeks the monitoring status of the VMWare servers cotinuously bounces between regular operational and unknown state.

Overall all the statistics still appear to be there.  I am not sure if anything is actually failing.  But the logs of my Orion server are now filled non-stop with each of the individual application monitors for each host bouncing between unknown and up state.

Is anyone else seeing this?  Is there something I can do to figure out what the issue is?  Am I polling the servers too often?

I am running latest versions of all Solarwinds and VMWare products involved and Cisco firmware.

 
  • Re: VMWare ESXi 5 Host Monitoring Status Bouncing
    aLTeReGo

    I can't say I've ever seen this behavior with default polling. If you've adjusted the polling interval I'd recommend turning this back to default to see if that resolves the issue. Another possibility is latency issues if you're polling across a bandwidth contentious highly latent link. If this doesn't describe your environment and you're currently polling with default intervals I'd recommend opening a case with support so we can take a look at your diagnostics.

  • Re: VMWare ESXi 5 Host Monitoring Status Bouncing
    phil.braithwaite

    We also see this happening with our VM Host templates. We are on the latest version of everything and confident that we dont have any network bottlenecks.

  • Re: VMWare ESXi 5 Host Monitoring Status Bouncing
    Tom Bateman

    I have observed the same behavior on a few of our ESX 4.1 hosts.  None of our ESXi 5 hosts have done this.

  • Re: VMWare ESXi 5 Host Monitoring Status Bouncing
    phil.braithwaite

    Seems like there was a Job timeout detected, this seems to have fixed it now.


    Can you do the following:

    To replace one or both job engine databases:

    1.     Log on to your Orion server using an account with administrative rights.

    2.     Click Start > All Programs > SolarWinds Orion > Advanced Features > Orion Service Manager.

    3.     Click Shutdown Everything.
    Note: It may take a few minutes to stop all services.

    1.     If you are replacing the job engine version 2 database, complete the following steps:

    2.     Make a backup copy of JobEngine35.sdf as JobEngine35.old.
    Notes:

    o    The default location of this file on Windows Server 2008 is C:\ProgramData\SolarWinds\JobEngine.v2\Data\.

    o    The default location of this file on Windows Server 2003 is C:\Documents and Settings\All Users\Application Data\SolarWinds\JobEngine.v2\Data\.

    3.     Make a copy of JobEngine35 - Blank.sdf and rename it as JobEngine35.sdf.
    Notes:

    o    The default location of this file on Windows Server 2008 is C:\ProgramData\SolarWinds\JobEngine.v2\Data\.

    o    The default location of this file on Windows Server 2003 is C:\Documents and Settings\All Users\Application Data\SolarWinds\JobEngine.v2\Data\.

    4.     Right-click JobEngine35.sdf, as renamed in the previous step.

    5.     Click Properties.

    6.     Clear the Read-only option.

    7.     On your Orion server, click Start > All Programs > SolarWinds Orion > Advanced Features > Orion Service Manager.

    8.     Click Start Everything.
    Note: It may take a few minutes to start all services.