Is there a way to alert if there have been less than (x) number of syslog messages from a host within a (x)-hour time period?
Basically I want to alert if my syslog forwarder program has stopped working on a host.
Open the Alert manager and change it to custom SQL sub type Node, it fills out the "Select Blah from blah", but then allows you to choose where and/or joins to get the data you want.
That querry would alert individually for each Node that doesn't have records in syslog for x period - you case might be to replace <_X_HOURS> with 48 and <_X_NUMBER> with 0, although sence you now are saying "any syslog messages" vs "been less than (x) number of syslog messages", it might be better to use "SELECT distinct IP FROM SysLog where [DateTime] > DATEADD(HOUR,-<_X_HOURS>,GETDATE())" so SQL doesn't need to count the records.
if you want specific nodes, you could limit it down by adding " and Caption in ('routera','serverb','firewallx') "
lots of math up there.
just add a field to the nodes table that stores the date/time of the last syslog you recieved from that address.
then write an alert to check that field vs current date/time
the one alert above looks like it would be reading the syslog table every minute (polling cycle) which could really kill NPM perfomance if it has to read through a few million syslogs for all the IPs when depending on the interval you set with X
this should keep the load from this one alert to a minimum
The table is indexed on those fields, but good point, it is extra overhead but if you are looking for 48 hours, you could set your polling cycle to be something like 6 hours, then it wouldn't be much of a hit
(oh, did you notice you can't post ill preceded by a k, it masks it out as though it is foul language. It makes it look worse, like you actually used a curse word... silly thwack)
Well, that works, let me throw another wrinkle in here for you. Sorry about adding additional criteria, I guess I originally didn't have a good focus on exactly what I wanted, and as things came into focus and I started seeing results, then there were some things I needed to refine.
The SQL script works. Here's what I plugged into SQL to get results:
Select distinct Hostname from SysLog
Where Not Hostname in (
SELECT Hostname
FROM SysLog
where [DateTime] > DATEADD(HOUR,-48,GETDATE())
group by Hostname
having COUNT(IP) > 1
)
However, there's 2 caveats to this:
1- There's some things I don't want in the results. For example, customer firewalls that I just do icmp monitoring on the outside interface that will never send me syslogs. I have a custom node property that I can use as a filter field. The field is called Hardware_Type. So Nodes.Hardware_Type is the column. For example, exclude any result where Nodes.Hardware_Type = 'CUSTOMER_FIREWALL'. I imagine this requires an inner join on Node.nodeid=syslog.nodeid
2- This will only work for nodes that have previously sent in a syslog message within the table trim period. If I add a host and it never has sent a syslog message then it won't show. There has to be a way to cross reference results to the existing nodes in the Nodes table.
Urgh. Getting more complicated than I originally thought. Isn't that always the case though?
I appreciate the responses. Please help with this one last (I promise!) teensie weensie little hurdle.
how about:
Select Caption From Nodes
where not IP_Address in (Select distinct IP from syslog where [DateTime] > DATEADD(HOUR,-48,GETDATE()))
and (Hardware_type not in ('CUSTOMER_FIREWALL','SIMPLEDEVICE','WORKSTATION') or Harware_type is null)
I don't trust syslog to resolve the hostname, but it can't mess up the IP, but you can change it if you like.
Still can't figure this out. It's driving me crazy. I plug this into SQL:
Select nodes.Caption From Nodes
where not nodes.IP_Address in (Select distinct syslog.IP from syslog where syslog.DateTime > DATEADD(HOUR,-48,GETDATE()))
and (nodes.Hardware_Type not in ('CUSTOMER_FIREWALL','VMWARE_HOST', 'CLUSTER', 'SAN') or Nodes.Hardware_Type is null)
And I get 15 results, all valid.
But when I setup my alert:
It's triggering on all the hosts. Help!
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 195,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.