This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Everything went into unknown state

 It all started this morning when we had to reboot the orion server.  We were unable to RDP into the server so we rebooted it.  I searched the forums and tried the suggestions I found in there.  I selected the templates and clicked submit again like one post said with no luck.  I also restarted the Orion job engine servers like another post said and that didn't help either.

 

Any suggestions?

Thanks. 

  • Are all the services running?  Specifically, check the SolarWinds Module Engine, SolarWinds Job Scheduler, and the SolarWinds Job Engine services. 

  •  Yes they are.  That was the first thing I checked.

  •  It happened again last night.  I stopped all snmp monitoring until a patch comes out for it. 

  • I have the same problem. Since installing APM, (i also have legacy App monitor) the service keeps dying, all Apps go into unknown state, and the alreting module craps out citing a stack dump.


    Does anyone know a fix for this?

  • I have the same exact issue. I guess the app is not ready for prime time.
  •  I've also got the same problem with all monitors showing as unknown.

     Restarting either services or box makes no difference.

     Monitors are Exchange (via WMI), SQL (Via WMI) and a few custom TCP port monitors.

     Bit of a poor show really...

     

  • APM definately has a few issues - I just had the entire set of Apps I was monitoring go into UNKNOWN also - couldn't see any obvious reason why - all services were running.  Rebooted the box and it all came back.  My monitors are a mix of WMI, SNMP and Service Monitors.

    My guess is the SNMP memory leak killed it since I saw it using around 800MB of Ram  - I killed it - but it didn't help anything recover...

    I've had intermittent problems with some monitors showing things down because they time out across slow WAN links - I bumped up the polling timeout which really helps (WMI and Service monitoring across the WAN is really slow compared to SNMP it seems).  A few monitors still show down from time to time but come back eventually...


    Hopefully things will improve once the service pack comes out which fixes the SNMP memory leak which is a pretty big deal given how slow WMI seems to be at querying for processes/services etc.

  • FormerMember
    0 FormerMember in reply to smcdonald

    My guess is the SNMP memory leak killed it since I saw it using around 800MB of Ram  - I killed it - but it didn't help anything recover...
     

    We're fairly certain the memory leak is unrelated to the monitors going unknown.  We see several of you discussing monitors going unknown on Thwack, but we need you to open Support tickets so that we can get data from your systems to help us diagnose the problem.  It's not something we have been able to reproduce in-house.