I have been trying to follow the syntax specified on how to restart a service useing APM. Has anyone used this scripti before? I cannot locate the executable on the srv to try it. This is a snippet from the help file
http://www.solarwinds.com/NetPerfMon/SolarWinds/wwhelp/wwhimpl/js/html/wwhelp.htm?context=SolarWinds&file=OrionAGAlertVariables.htm
under
Application Performance Monitor Administrator Guide : Monitoring Your Applications : Creating Alerts for Applications : Restarting Windows Services with an Alert Action
it says"
Restarting Windows Services with an Alert ActionYou can use the alert trigger action "Execute program" to restart a Windows service that is down. Orion APM includes a program named APMServiceControl.exe that takes the Component ID as a parameter and restarts the service.Example Alert Manager Trigger Action to Restart a Windows ServiceExecute program: APMServiceControl.exe ${ComponentId}The SolarWinds Orion Alert Manager includes a pre-configured alert to automatically restart the associated service if a Windows service component monitor is down, but the alert is disabled by default.To enable the Restart a Service alert:1. Start Advanced Alert Manager in the Alerting, Reporting, and Mapping program group.2. Click Configure Alerts from the View menu.3. Check Restart a Service.4. Click Done.I cannot locate the APMServiceControl.exe nor the predefined alert?
This only exists in APM 2.5. I suspect you have APM 2.0.
If you have APM 2.5, look for C:\Program Files\Solarwinds\Orion\APM\APMServiceControl.exe. The ${componentID} portion has to be that of a Service monitor.
The predefined alert is named "Restart a service." It's at the the very end of the list when you go to configure alerts in the Advanced Alert Manager.
great Thank you , I have 2.5 now- didnt even know it was out.. I found the services now.
Roger, I am running version 2.5 and do not see the sample alert "Restart a Service" in Advanced Alert Manager.
I also do not see the APMServiceControl.exe in the referenced folder.
Do I need to run a service repair or re-install of APM?
If this were my documentation test server, I'd probably just re-run the APM 2.5 installer and do a Repair to get the missing file copied over.
But, because this is probably your production server monitoring your business network, I have to advise you to open a support ticket instead.
I can't think of a good reason why that file and alert would be missing from your APM 2.5 deployment. They're not optional features that can be skipped during the install process.
I have APM 3.0 SP1 and everything works fine, btw.
I have a request to monitor a specific service on nearly 200 PCs, running a POS system. The target process is part of the POS system. The request is to monitor the POS PCs, check to see if CPU utilization goes to 100% and stays there, then terminate the target process, which is the cause of the CPU spike, then send the ALert e-mail stating the process was killed.
Is there a way to evaluate a list of target hosts and a specific process on each? Then, issue a kill (taskkill, psexec, etc.) getting ONLY the "bad" process?
I had a similar need to check the age of a specific file in an FTP destination directory and alert if the file was "stale". I had 13 files to check, so I ended up building 13 APM Components in a Monitor. It works fine, but if we spin up 50 or 100 FTP sends, this method of "one each" is cumbersome.
Any ideas on how to pick off a bad process from a list of hundreds?
Thanks!
NinjaNerd56,
To answer your specific question: We do not have a method where you can configure a monitor component to repeat its task against n computers.This isn't our design. If one really really wanted to do that, you'd have to do something like this in a script and have all the monitoring and restarting done there.
We can do this out of the box by doing something like this:
1) Manufacture a template that monitors this service (I'm assuming this process is a service).
2) Configure a new alert to look for this component (by name), set the CPU threadhold, and now long it had to remain that way for the alert actions to trigger.
3) Add in the restart action (using restart a service as a reference) and give it a test by: dropping the cpu threshold to be >= 0. When you're satisfied it's restarting the services, set the threshold back to your desired levels.
4) After your template and alert works, add all the nodes to Orion (There are fast ways to do this via discovery).
5) After all the nodes are in, you apply the template to all these nodes.
6) You may have some cleanup to do regarding credentials, but overall it's a process that should work well.
Peter,
Thanks...that's about what I figured. The method you describe is certainly workable.