This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Reboot server when process/application has issues

We have servers that run a proprietary software. At least once a week, one of these applications goes haywire and takes up 99% of the CPU on the VM. Dev aren't going to fix it anytime soon.  I tried to tell it to reboot the server via WMI/SNMP but it doesn't work.  I'm assuming because of the CPU being locked up.

This will have to be done through vman and the vm would need to be rebooted.  We get paged and do this manually at the moment. I'm not sure how to automate this.  The options solarwinds gives me for this service don't include the parameters.  Basically I need to reboot the VM when this happens, but I'm not sure how to write it within the triggered actions with the variables or if this can even be done.

In the attached it says to select which vm to reboot, so I'm thinking the only way to do this is to create a separate alert for each of the 4 servers and then select the server to correspond with the alert.

  • chris.shannahan  wrote:

    We have servers that run a proprietary software. At least once a week, one of these applications goes haywire and takes up 99% of the CPU on the VM. Dev aren't going to fix it anytime soon.  I tried to tell it to reboot the server via WMI/SNMP but it doesn't work.  I'm assuming because of the CPU being locked up.

    This will have to be done through vman and the vm would need to be rebooted.  We get paged and do this manually at the moment. I'm not sure how to automate this.  The options solarwinds gives me for this service don't include the parameters.  Basically I need to reboot the VM when this happens, but I'm not sure how to write it within the triggered actions with the variables or if this can even be done.

    In the attached it says to select which vm to reboot, so I'm thinking the only way to do this is to create a separate alert for each of the 4 servers and then select the server to correspond with the alert.

    An alert would be a great way to do this workflow. Enable your alert based on VM CPU thresholds. For instance if you set your threshold to 80% for cpu load and set the alert to trigger on any VM that hits that threshold for more than X minutes then you can select the reboot option to execute it based on the VM that triggered the alert. To refine it down to the 4 VMs in particular, you can also set the alert trigger to only work if there's a custom property on the VM, or another identifying attribute.

    pastedImage_2.png

  • Thanks, I don't have that option. It must be because it's based on the process going critical (see attached vm1)

    I will test to see if I can set it up via CPU load, etc.

    vm1.jpg

  • chris.shannahan  wrote:

    Thanks, I don't have that option. It must be because it's based on the process going critical (see attached vm1)

    I will test to see if I can set it up via CPU load, etc.

    vm1.jpg

    Ah no try instead of alerting on the component, alert on the Virtual Machine entity itself.

  • Yeah I've setup the alert on the CPU Load, it should correspond with the component being critical and I should get emails for both.  If they do correspond I will then try the reboot vm portion.

  • Ok so the alerts work, so I will be able to set this up to reboot, power off/on the guest.  Does anyone know if this is done through the VMware tools?  Sometimes when this happens, when I try to reboot in Vcenter it fails because the system isn't responding...same as when I try to reboot via SNMP, it just doesn't work.  So when this happens we have to do power off the VM, not the Guest OS....

    I'm not sure exactly which one Solarwinds does?

    pastedImage_0.png