7 Replies Latest reply on Feb 6, 2013 4:06 AM by lionkin

    Monitoring the status of a Windows Service using NPM only

      Hi All,

      I have NPM installed in my environment (not APM) and would like to know if it is possible to monitor the status of a windows service (running, stopped, etc)?  We do not have the budget to include APM in our management suite so I am looking to only use NPM for this purpose.

       

      Can this be done?

        • Re: Monitoring the status of a Windows Service using NPM only
          byrona

          There isn't any good way to do this. 

          The best possible option to look into would be using a Universal Device Poller with the Host Resource MIB; however, I am not sure how well this would work with indexing and across multiple systems where the OID would be different on different systems for the same process.

          Hope this helps!

          If you are able to get something to work, please post your work as I am sure others would be interested.

          Ultimately APM is going to be your best bet but I am guessing you are already aware of that.

            • Re: Monitoring the status of a Windows Service using NPM only
              smargh

              The best possible option to look into would be using a Universal Device Poller with the Host Resource MIB; however, I am not sure how well this would work with indexing and across multiple systems where the OID would be different on different systems for the same process. Ultimately APM is going to be your best bet but I am guessing you are already aware of that.

              I found that you cannot monitor Windows services, or processes, by SNMP polling - when a service isn't running, the OID no longer exists, and you can't alert based on something which doesn't exist.

              Also, OIDs for services are huuuuuge - there's a standard prefix and then each number is something like the decimal representation of every character in the service name. My memory on that may be a little hazy though - it has been a long time since I experimented with it.

                • Re: Monitoring the status of a Windows Service using NPM only
                  byrona

                  The best possible option to look into would be using a Universal Device Poller with the Host Resource MIB; however, I am not sure how well this would work with indexing and across multiple systems where the OID would be different on different systems for the same process. Ultimately APM is going to be your best bet but I am guessing you are already aware of that.

                  I found that you cannot monitor Windows services, or processes, by SNMP polling - when a service isn't running, the OID no longer exists, and you can't alert based on something which doesn't exist.

                  Also, OIDs for services are huuuuuge - there's a standard prefix and then each number is something like the decimal representation of every character in the service name. My memory on that may be a little hazy though - it has been a long time since I experimented with it.

                  Hrm, I had a feeling that something with that option wouldn't work, thanks for pointing this out Smargh!  I guess I will resort back to my origional statement "there isn't any good way to do this".  = )

                    • Re: Monitoring the status of a Windows Service using NPM only
                      LloydK

                      I've had minor success with monitoring Windows services via just NPM 10.

                      Basically I created a UDP to poll 1.3.6.1.4.1.77.1.2.3.1.1 (svSvcName) (no historic data kept) which contains a list of all of the currently running services.  Any services that are not running will not be returned.

                      Within Advanced Alert Poller I use a trigger condition of 'Custom SQL Alert' type for a 'Custom Node Poller' trigger.  If I was looking to alert on if the FTP service was down I would use a SQL statement like this:

                      WHERE 
                      (
                        (Nodes.Vendor = 'Windows') AND
                        (CustomPollers.UniqueName = 'svSvcName') AND
                        (CustomPollerStatus.Status not LIKE '%Microsoft FTP Service%')
                      )

                      This would then allow me to create whatever Trigger Actions I needed (i.e. e-mail alert, trap, syslog, etc).

                      The only issue with this simple query method is that I have not found a good way to create a reset condition so that only one reset action is generated per Windows service stop/start.  If I add a reset action e-mail using this method it would fire every time the trigger action was false.

                      Perhaps one of the many Advanced Alert Veterans out there would have a suggestion for how to get around the reset action issue.

                      Thanks,
                      Lloyd

                • Re: Monitoring the status of a Windows Service using NPM only
                  smargh

                  Windows has a built-in utility to forward specific event logs as SNMP traps to Orion. Run evntwin.exe. You can apply a config file with evntcmd.exe. I use it on our Orion install to alert on certain events - for example, certain Exchange events trigger instant email alerts, and all service crashes ("terminated unexpectedly") get alerted on instantly. We also alert on service stop & start events, but you must be selective with your trap alerting criteria unless you really want thousands of alerts in a morning saying that the "WinHTTP Web Proxy Auto-Discovery Service" service has stopped & started.

                  The way I do it have a standard config file which is applied to every single server. The config is then applied via a Powershell script which queries the Orion database for all Windows servers which are "up", and runs the command cmd.exe /c evntcmd.exe -v 10 $thisDir\events.cnf -s $($_.IP_Address)

                  I've pasted the powershell script here: http://pastebin.com/tdHvMrum - it requires this script to interface with MSSQL:
                  http://blogs.technet.com/b/heyscriptingguy/archive/2010/11/01/use-powershell-to-collect-server-data-and-write-to-sql.aspx

                  The script does need improvements, especially in the way it logs things - I got fed up trying to get it perfect, and just left it logging to two separate files.

                  The events.cnf file is parsed by evntcmd.exe in such a way that it only processes lines which start with #pragma, so I add comments to the file starting with a semicolon. My config file is here: http://pastebin.com/JNpnzkyr

                  Trap alerting needs to be improved by Solarwinds for trap alerts - it requires trial & error to figure out which variables to specify in alert criteria & messages. For example, this is the email subject of an alert for a service crash: *** WARNING: ${Node.Site_Name}: ${vbData5}: Service Error: ${vbData8} - I had to do test alerts containing every vbData1 through to vbData13 to find out which translates to what varbind. ${vbName1} etc contains the varbind label for the respective vbData variable. For service crashes, the variable ${vbData3} contains the actual event log full description.

                  Events appear as Orion traps like this:

                  sysUpTime=96 days 2 hours 45 minutes 51.72 seconds
                  snmpTrapOID=EVNTAGENT-MIB:svcCtlMgr.0.1073748859
                  eventText=The Print Spooler service was successfully sent a stop control.
                  eventUserId=administrator
                  eventSystem=YOURSERVERNAME
                  eventType=4
                  eventCategory=0
                  eventVar1=Print Spooler
                  eventVar2=stop
                  experimental.1057.1=192.168.1.6
                  snmpTrapEnterprise=EVNTAGENT-MIB:svcCtlMgr

                  You can use those labels, "eventVar1", "eventText" etc, in alert criteria - for example, to delete certain annoying traps. I do this for all Windows Installer events by specifying "eventVar1 is equal to Windows Installer" with an alert action to discard it.

                  Monitoring Windows services with this method does have some fairly annoying quirks:

                  - You can get lots of alerts when a server is shut down or booted up.
                  - You can miss an alert email when Orion is sending out many others at the same time, and it can be difficult to identify the one alert saying that a service has crashed, or that someone has stopped a service but not started it again.
                  - The SNMP traps are, of course, sent to Orion by UDP, so successful delivery is not guaranteed.
                  - I don't use the SolarWinds Event Log Forwarder because it's not very configurable, nor (AFAIK) scriptable, and needs its own service on every node.
                  - It's a pain in the neck to write alerts and get the trigger criteria 100% correct - there are a lot of "ends with" and "starts with" to work around quirks of both Windows and NPM.
                  - I don't think Microsoft considers this to be good practice. However, this is an excellent method of getting instant alerts based on Windows events though. You should, of course, buy the Orion APM module and do this properly. 

                  • Re: Monitoring the status of a Windows Service using NPM only
                    Radioteacher

                    Monitoring the status of a Windows Service using NPM only

                     

                    I was wondering if this is possible now with NPM 10.3 using WMI to poll the server?