Restart a service based on a process metric?

One of the processes on a number of remote branch VMs is leaking handles and that eventually causes the server to run out of memory.

Restarting the service that owns (spawned) the process resolves the issue.

There's already a performance counter that is tracking the handle count for that process in SolarWinds - I just can't figure out how to restart the service based on the counter value.

So, the question:

How do I structure the alert, the trigger condition and the trigger action to automate this via SolarWinds?

Trigger condition:

  • Objects:
    • Component Name = <the name of the service>
  • "The actual trigger condition:"
    • Component Name = <handle count tracker for a process that is spawned by that service>
    • Component Status in Warning or higher status

Obviously, this won't work because the trigger condition never matches the objects - i.e. there are never any objects matching these conditions. If I put <handle count tracker for a process that is spawned by that service> in the "objects", then SolarWinds' standard service restart utility (APMServiceControl.exe ${N=SwisEntity;M=ComponentAlert.ComponentID} -c=RESTART) won't work as it only works on services, not processes.

Is it possible to modify that trigger action so that it restarts the service that own the process?

If not, is there a simple way to resolve this?

Thank you!

P.S. 

  • Add a Windows Service Monitor into that Application Template. When you create an alert for the perfmon counter, update the Alert Action with the explicit ID of the Windows Service Monitor ComponentID. You can see the ComponentID in the URL when drilling into the Windows Service Monitor in that Application Template, it'll have an AM:<ID> at the end.

    Example:
    URL: /Orion/APM/MonitorDetails.aspx?NetObject=AM:1234

    Alert Action: APMServiceControl.exe 1234 -c=RESTART

  • Nice! I'll check it out when I am back at work.

    P.S. Tying the trigger action to a specific ID makes the action dedicated to that service and condition, i.e. I wouldn't be able to use the same alert for a different service with a similar issue?

    Or to put it differently: there isn't a way to force a service monitor into "warning" status based on the performance counter value of that service's process, without making it even more complex?

    (It's fairly simple to do this in PowerShell - get the process of a running service, get a handle count w/o even running performance counters, then restart the service if the handles have ballooned. Feel strange that it's not easy in SolarWinds.)

    (If the question doesn't make sense - please let me know, and thanks again!)

  • Yup, that is correct. It would turn the alert action into a dedicated alert for that single scenario. 

    You could write your own PowerShell monitor directly in SolarWinds to handle and restart the services too. There is a level of complexity with that too.

    You might also consider using an Application Custom Property (tag) and reference that tag in the Alert Action instead of a hard coded ID. That would make this alert action more dynamic, but it would require the tag to be manually updated for each Application Template where one monitor needs to reach a certain threshold to restart the service of another monitor. If you went this route, it would be an alert variable for that Application custom property.