    Syslog alert help?


      I would like to create an alert for (APC) UPSs for "On/Off Battery" but suppress alert actions for a specific interval. I'm having trouble figuring out how to do this.

      We have several chronic sites that have poor power and frequently flap on/off battery for only a few seconds but the flapping condition can last a couple hours before it mysteriously clears and power returns to normal. This generates hundreds of on/off alerts when this flapping condition occurs.

      Today, we have configured two alerts through the Syslog Viewer; one for "UPS On Battery" and one for "UPS Off Battery" messages. However, I can't see a way to configure a syslog rule with a trigger threshold (suspend for period of time) on a per Node basis. The trigger threshold only works based on Message unless I'm missing something.

      Advanced Alert Manager doesn't provide Syslog as a "Type of Property to Monitor".  However, AAM does provide "Custom SQL Alert" but I'm not sure if this would accomplish what I want to do.  If it will, I'm not sure how to build the trigger condition.

      Can anyone advise how I may be able to do this?

          Andy McBride

          If your APC has a MIB with an object that reflects the power status you could build a Universal Device Poller (UnDP) to the OID and alert odd of it. AAM allows for alert delays. Check in the content exchange section for a UnDP that may do the job for you.

              Thanks for a fast response Andy.

              However, we already have a number of APC UnDPs configured and one is polling power status.  The problem is, the UnDP polling interval is based on NPM's statistics interval (10 min for our shop).  (Side bar - There is an open feature request that has existed for quite a long time, to add individual, configurable polling interval to UnDPs but apparently hasn't received the priority love by SolarWind's developers).  

              I've attempted to set the NPM statistics interval as low as 2 minutes for our APCs (and we have hundreds) and it just doesn't work well (i.e., not very accurate and timely). Thus, the reason we used syslog so we could be informed immediately when the event actually occurred.  But now, we would like to be able to suppress By Node subsequent repetitive alerts for a given period of time and/or message repetitions.   SysLog supports the repetitions for a given interval but not at a Node level.  If I suppress via SysLog alert, I could potentially and very probably, suppress other on/off battery alerts from other nodes.  Case in point, a site has multiple UPS units. The site loses power and all the UPSs fire a syslog of "On battery".  Using the Syslog Alert suppress logic, the first UPS to fire an "On battery" message would trigger the alert notification and the "On battery" messages from the other UPSs would be suppressed.  This is not what we want to happen.  Let me know if I'm missing something.

                  Have you already tried to define the alert with both trigger thresholds as well as DNS Hostname(s)?

                      Are you referring to using DNS Hostname pattern matching?  I haven't tried it.  In specifying a list of hosts, will the trigger thresholds apply individually to each host or the list of hosts as a whole?  For example, I have UPS1, UPS2, UPS3 thru UPS10 with a trigger "Suspend Alert Actions for 10 minutes".   Now, a syslog "on battery" message comes in for UPS1 the Alert Actions will be executed but then 5 minutes later an "on battery" message comes in for UPS3, 7 minutes later for UPS4, will the UPS3 and UPS4 alert actions be suppressed by the trigger threshold or will the Alert Actions be executed because it is a different DNS Hostname?