This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alert if no syslog messages sent from a host recently

          Is there a way to alert if there have been less than (x) number of syslog messages from a host within a (x)-hour time period?

Basically I want to alert if my syslog forwarder program has stopped working on a host.

  • Custom SQL Node Alert:

    Where Not IP_Address in (

      SELECT IP

      FROM SysLog

      where [DateTime] > DATEADD(HOUR,-<_X_HOURS>,GETDATE())

      group by IP

      having COUNT(IP) > <_X_NUMBER>

    )

  • Is that just for 1 specific node?  I want to be alerted if any node hasn't sent any syslog messages in 48 hours.   Not sure what the variables are in your SQL statement.

    Thanks for the reply.

  • Open the Alert manager and change it to custom SQL sub type Node, it fills out the "Select Blah from blah", but then allows you to choose where and/or joins to get the data you want.

    That querry would alert individually for each Node that doesn't have records in syslog for x period - you case might be to replace <_X_HOURS> with 48 and <_X_NUMBER> with 0, although sence you now are saying "any syslog messages" vs "been less than (x) number of syslog messages", it might be better to use "SELECT distinct IP FROM SysLog where [DateTime] > DATEADD(HOUR,-<_X_HOURS>,GETDATE())" so SQL doesn't need to count the records.

    if you want specific nodes, you could limit it down by adding " and Caption in ('routera','serverb','firewallx') "

  • FormerMember
    0 FormerMember

    lots of math up there.

    just add a field to the nodes table that stores the date/time of the last syslog you recieved from that address.

    then write an alert to check that field vs current date/time

    the one alert above looks like it would be reading the syslog table every minute (polling cycle) which could really kill NPM perfomance if it has to read through a few million syslogs for all the IPs when depending on the interval you set with X

    this should keep the load from this one alert to a minimum

  • The table is indexed on those fields, but good point, it is extra overhead but if you are looking for 48 hours, you could set your polling cycle to be something like 6 hours, then it wouldn't be much of a hit

    (oh, did you notice you can't post ill preceded by a k, it masks it out as though it is foul language.  It makes it look worse, like you actually used a curse word... silly thwack)

  • Well, that works, let me throw another wrinkle in here for you.  Sorry about adding additional criteria, I guess I originally didn't have a good focus on exactly what I wanted, and as things came into focus and I started seeing results, then there were some things I needed to refine.

    The SQL script works.  Here's what I plugged into SQL to get results:

    Select distinct Hostname from SysLog

    Where Not Hostname in (

      SELECT Hostname

      FROM SysLog

      where [DateTime] > DATEADD(HOUR,-48,GETDATE())

      group by Hostname

      having COUNT(IP) > 1

    )

    However, there's 2 caveats to this:

    1- There's some things I don't want in the results.  For example, customer firewalls that I just do icmp monitoring on the outside interface that will never send me syslogs.  I have a custom node property that I can use as a filter field.  The field is called Hardware_Type.  So Nodes.Hardware_Type is the column.   For example, exclude any result where Nodes.Hardware_Type = 'CUSTOMER_FIREWALL'.  I imagine this requires an inner join on Node.nodeid=syslog.nodeid

    2- This will only work for nodes that have previously sent in a syslog message within the table trim period.  If I add a host and it never has sent a syslog message then it won't show.  There has to be a way to cross reference results to the existing nodes in the Nodes table.

    Urgh. Getting more complicated than I originally thought.  Isn't that always the case though?

    I appreciate the responses.  Please help with this one last (I promise!) teensie weensie little hurdle.

  • how about:

    Select Caption From Nodes

      where not IP_Address in (Select distinct IP from syslog where [DateTime] > DATEADD(HOUR,-48,GETDATE()))

         and (Hardware_type not in ('CUSTOMER_FIREWALL','SIMPLEDEVICE','WORKSTATION') or Harware_type is null)

    I don't trust syslog to resolve the hostname, but it can't mess up the IP, but you can change it if you like.

  • Okay, that works in SQL, but not in Orion.  WTF?

    In SQL I get the correct results.  When I setup the alert in Orion it triggers on all the hosts.  WTF?

  • Still can't figure this out.  It's driving me crazy.   I plug this into SQL:

    Select nodes.Caption From Nodes

      where not nodes.IP_Address in (Select distinct syslog.IP from syslog where syslog.DateTime > DATEADD(HOUR,-48,GETDATE()))

         and (nodes.Hardware_Type not in ('CUSTOMER_FIREWALL','VMWARE_HOST', 'CLUSTER', 'SAN') or Nodes.Hardware_Type is null)

    And I get 15 results, all valid.

    But when I setup my alert:

    http://i.imgur.com/Nx6ga.png

    It's triggering on all the hosts.  Help!

  • Nevermind.  I'm dumb.  My first go around with Custom SQL Alerts.  I didn't notice the select statement at the top was already there.   Working correctly now!