This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Everything went into unknown state

rdrash over 16 years ago

It all started this morning when we had to reboot the orion server. We were unable to RDP into the server so we rebooted it. I searched the forums and tried the suggestions I found in there. I selected the templates and clicked submit again like one post said with no luck. I also restarted the Orion job engine servers like another post said and that didn't help either.

Any suggestions?

Thanks.

0 josh.clark over 16 years ago

Are all the services running? Specifically, check the SolarWinds Module Engine, SolarWinds Job Scheduler, and the SolarWinds Job Engine services.
Cancel
Vote Up 0 Vote Down

Cancel
0 rdrash over 16 years ago in reply to josh.clark

Yes they are. That was the first thing I checked.
Cancel
Vote Up 0 Vote Down

Cancel
0 rdrash over 16 years ago in reply to rdrash

It happened again last night. I stopped all snmp monitoring until a patch comes out for it.
Cancel
Vote Up 0 Vote Down

Cancel
0 jimrobinson over 16 years ago in reply to rdrash

I have the same problem. Since installing APM, (i also have legacy App monitor) the service keeps dying, all Apps go into unknown state, and the alreting module craps out citing a stack dump.

Does anyone know a fix for this?
Cancel
Vote Up 0 Vote Down

Cancel
0 somewhere over 16 years ago in reply to jimrobinson

I have the same exact issue. I guess the app is not ready for prime time.
Cancel
Vote Up 0 Vote Down

Cancel
0 a1ex over 16 years ago in reply to somewhere

I've also got the same problem with all monitors showing as unknown.
Restarting either services or box makes no difference.
Monitors are Exchange (via WMI), SQL (Via WMI) and a few custom TCP port monitors.
Bit of a poor show really...
Cancel
Vote Up 0 Vote Down

Cancel
0 smcdonald over 16 years ago

APM definately has a few issues - I just had the entire set of Apps I was monitoring go into UNKNOWN also - couldn't see any obvious reason why - all services were running. Rebooted the box and it all came back. My monitors are a mix of WMI, SNMP and Service Monitors.

My guess is the SNMP memory leak killed it since I saw it using around 800MB of Ram - I killed it - but it didn't help anything recover...

I've had intermittent problems with some monitors showing things down because they time out across slow WAN links - I bumped up the polling timeout which really helps (WMI and Service monitoring across the WAN is really slow compared to SNMP it seems). A few monitors still show down from time to time but come back eventually...

Hopefully things will improve once the service pack comes out which fixes the SNMP memory leak which is a pretty big deal given how slow WMI seems to be at querying for processes/services etc.
Cancel
Vote Up 0 Vote Down

Cancel
0 FormerMember over 16 years ago in reply to smcdonald

My guess is the SNMP memory leak killed it since I saw it using around 800MB of Ram - I killed it - but it didn't help anything recover...

We're fairly certain the memory leak is unrelated to the monitors going unknown. We see several of you discussing monitors going unknown on Thwack, but we need you to open Support tickets so that we can get data from your systems to help us diagnose the problem. It's not something we have been able to reproduce in-house.
Cancel
Vote Up 0 Vote Down

Cancel