Noes not responding to SNMP or WMI

Version 1

    There are times when the clients' device stops polling for whatever reason.  This could be an issue with the device or a change in credentials.  Almost all clients I have been involved with are not aware that the polling has stopped.

     

    There is a simple way of noticing this, which is by looking at the timestamp of the CPU polling.  If it is more than 35 minutes from the current time, the node is having issue.

     

    Here is the report for it:

     

    SELECT n.Caption as Node_Name, n.ip_address as IP_Address, n.ObjectSubType as Poll_Type

    ,Cast(DateDiff(day,MAX(c.datetime),getdate()) as varchar) + ' Day(s) ' + convert(char(8),dateadd(second,DateDiff(second,MAX(c.datetime),getdate()),0),14) as Duration

    ,DateDiff(mi,MAX(c.datetime),getdate()) minutes_since

    FROM Nodes n

    Inner join CPUload c on c.NodeID = n.NodeID

    WHERE n.status = 1 and (n.ObjectSubType = 'wmi' or n.ObjectSubType = 'snmp')

    GROUP BY n.Caption, n.StatusDescription,  n.ip_address, n.ObjectSubType

    Having DateDiff(mi,MAX(c.datetime),getdate()) > 35

    ORDER BY minutes_since desc

     

    Reporting is nice, but a better way to notice this is by creating an alert for it - so it can be resolved in a timely manner.  For the alert, you would need to use a custom sql:

     

    SELECT nodes.NodeID, nodes.caption FROM Nodes

    Inner join CPUload c on c.NodeID = nodes.NodeID

    WHERE nodes.status = 1 and (nodes.ObjectSubType = 'wmi' or nodes.ObjectSubType = 'snmp')

    GROUP BY nodes.Caption, nodes.nodeid

    Having DateDiff(mi,MAX(c.datetime),getdate()) > 35

     

    Using both the report and alert will make sure you are getting data from all nodes and avoid the embarrassing situation when a server crashes due to high CPU and the boss comments - "I thought that SolarWinds was monitoring this".

     

     

    Thanks

    Amit Shah

    Loop1 Systems