This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Restart Windows Service from API poll failure?

Hey! I don't know if it's possible but we have a windows server that has a process that has trouble sometimes. Our F5 has a health monitor, it fails that monitor. The windows service is still up and running when that health check fails.

So I'm trying to restart that Windows service if that monitor fails on the F5.  I see SAM has API pollers. So I could do the poll from the server node, but I'm not sure how I get SW to restart that service if that API poller comes back with anything but a 200 OK.

Thoughts?

Parents
  • , are you monitoring the Windows service as well? If so you could create an alert action to restart that ComponentID. Take a look at this article. For the most part you would ignore much of the alert trigger condition steps since you'll be setting you trigger logic up differently, but when you get to the trigger action section you'll want to replace the ${N=SwisEntity;M=ComponentAlert.ComponentID} variable with the ComponentID of the monitored Windows service you'd like it to restart.

  • Yes I am monitoring the server and service. I did see some of this. I have 4 servers, they all have issues once and a while with that service. So the only thing I'm not sure about is if I create the API poller on all servers, then create the alert that if that poller comes back with anything but 200, restart the service like you mentioned, I'm not sure if it will pick up the server that is alerting or not.  i guess I will have to test it.

  • I did try that earlier today but having some trouble getting it to work correctly, the API poller gets a 200 and you can monitor that value, I don't see that in the HTTP monitor?

  • , by default the HTTP Monitor looks for a 200 response code, if it receives anything other than 200 it will report as down. If you post a screenshot of the error I might be able to assist in getting that one to work. 

  • Ok thanks, I didn't know that. I had put 200 in the response time haha. I removed that and it's showing green and up on all 4 servers. Now I'm not sure what you mean by... 

    Using this method instead and looking for the 200 return code will allow you to create a variable (you would insert this created variable to prelace the ComponentID in the command line) that pulls the componentID for the windows service on the corresponding Windows Server and only restart that problematic service. 
  • , so now that you have it working you would create a second component on that template to also monitor the Windows Service for that HTTP monitor. Then in the alert action you would create a variable (when you click the Variable picker there is a custom option at the bottom), the variable you create would be a SWQL query that grabs the componentID of the Windows Service Component in the same assigned application monitor for that server. that effectively creates the scenario when this HTTP monitor goes down restart the windows service component assigned on this specific server.

  • If I'm already monitoring, I shouldn't need to add it to this monitor as well though right?

  • , the goal would be to have both the HTTP Monitor and the Windows Service Monitor within the same Application Template assigned to the servers. Since the HTTP Monitor would be net new you could just add it to the same template as the Windows Service. The reason we want to do this is it makes it a lot easier to right out the query for the Custom Variable.

  • I have them together. Now just have to figure out what the trigger and actions need to be.

  • , the trigger condition would be if that HTTP Monitor is down. The next thing would be working out that custom variable. TBH i've done it before but I didnt save the query, i dont have a test environment that i could possibly work it out. but basically the query should grab the componentID of the windows service that is in the same assigned application monitor as the triggering HTTP Monitor. If you want you could send me like a webex and i can hop on a call to help you out. PM me if thats what you would prefer and we can work out linking up.

  • Thanks. I will shoot you a PM tomorrow to see what works.

  • So I haven't heard from you, but I was able to get a hold of a test instance to kind of work this out. Here's my set up.

    1. I have an application template with two components, one is the HTTP monitor, and the other is the Windows Service to restart. *NOTE* if you have more than one Windows Service in the template you would have to modify the query for the custom variable in the alert action
    2. I then set up an alert for this application that alerts when the application is down (when HTTP Component is down the application status will show as down)
    3. On trigger action i have the action to restart the Windows Service. *NOTE* This is where things deviate from the command line provided in the SW KB article.
    4. The query used above to replace the componentID variable from the KB Article is here 
      SELECT A.Components.ComponentID FROM Orion.APM.Application A WHERE A.ApplicationID = ${N=SwisEntity;M=ApplicationID}  and A.Components.ComponentType = 9
      Basically what it does is Calls the ApplicationID of the alerting Application Instance, and then grabs the ComponentID of the Windows Service Component found within that ApplicationID. *NOTE* If you have multiple Windows Services in the application template you can add something like a component name to make sure you grab the right service. If you did it would look something like this.
      SELECT A.Components.ComponentID FROM Orion.APM.Application A WHERE A.ApplicationID = ${N=SwisEntity;M=ApplicationID}  and A.Components.ComponentType = 9 AND A.Components.Name like '%Task%'
    5. The full command line argument is 
      APM\APMServiceControl.exe ${N=SWQL;M=SELECT A.Components.ComponentID FROM Orion.APM.Application A WHERE A.ApplicationID = ${N=SwisEntity;M=ApplicationID}  and A.Components.ComponentType = 9} -c=RESTART
    6. Test and you should be able to review the Service restart via a log or something.
Reply
  • So I haven't heard from you, but I was able to get a hold of a test instance to kind of work this out. Here's my set up.

    1. I have an application template with two components, one is the HTTP monitor, and the other is the Windows Service to restart. *NOTE* if you have more than one Windows Service in the template you would have to modify the query for the custom variable in the alert action
    2. I then set up an alert for this application that alerts when the application is down (when HTTP Component is down the application status will show as down)
    3. On trigger action i have the action to restart the Windows Service. *NOTE* This is where things deviate from the command line provided in the SW KB article.
    4. The query used above to replace the componentID variable from the KB Article is here 
      SELECT A.Components.ComponentID FROM Orion.APM.Application A WHERE A.ApplicationID = ${N=SwisEntity;M=ApplicationID}  and A.Components.ComponentType = 9
      Basically what it does is Calls the ApplicationID of the alerting Application Instance, and then grabs the ComponentID of the Windows Service Component found within that ApplicationID. *NOTE* If you have multiple Windows Services in the application template you can add something like a component name to make sure you grab the right service. If you did it would look something like this.
      SELECT A.Components.ComponentID FROM Orion.APM.Application A WHERE A.ApplicationID = ${N=SwisEntity;M=ApplicationID}  and A.Components.ComponentType = 9 AND A.Components.Name like '%Task%'
    5. The full command line argument is 
      APM\APMServiceControl.exe ${N=SWQL;M=SELECT A.Components.ComponentID FROM Orion.APM.Application A WHERE A.ApplicationID = ${N=SwisEntity;M=ApplicationID}  and A.Components.ComponentType = 9} -c=RESTART
    6. Test and you should be able to review the Service restart via a log or something.
Children