This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Agent not collecting data - Can we create alert for it..?

Dear Thwack members,

Have anyone encountered a scenario like, Agent is absolutely UP & running, but it's not collecting any data at all from the backend & SolarWinds still shows the old data collected days ago. 

In this case, can we create any alerts for "last polling data not received"..?

Thanks in advance..!

-Solaiy.

  • Loooooadsa threads on this already. I think a alert actually make it into the default set recently too. Lastuptimepollutc is usually the key value.

  • Here is the trigger condition for our alert if the agent is not responsive.

    The default alert may be enough for what you are looking for. Copy "Node not polled in last 10 minutes" and add in Agent.

  • Hello

    Thanks for the response. But the problem I'm facing is, node is absolutely behaving well & so the agent as well. But the agent isn't collecting any metrics for CPU or Memory. 

    I think, we may need to go via some scripts which will fetch details directly from DB.

    -SOSP

  • HI , This example Custom SWQL alert will trigger if the LastSystemUptimePollUTC is not updated in the last 120 minutes.

    SELECT Nodes.Uri, Nodes.DisplayName FROM Orion.Nodes AS Nodes
    
    LEFT JOIN Orion.NodeSettings ns ON Nodes.NodeID = ns.NodeID AND SettingName like '%Credential%'      
    LEFT JOIN Orion.Credential c ON ns.SettingValue = c.ID      
    JOIN Orion.Engines e ON e.Engineid=Nodes.Engineid  
    --WHERE Status<>'2'  
    -- Status Not Down, UnManaged, Unreachable  
    WHERE Status NOT IN ('2', '9', '12')  
    AND ObjectSubType='AGENT'  
    AND MinuteDiff(LastSystemUptimePollUTC,GETUTCDATE())>120 
    
    ORDER BY LastSystemUptimePollUTC

  • Hello

    Thanks much for your reply & the script. But it's kind of complex here. If we are polling for LastSystemUptimePollUTC, it will check for the polling information or heartbeat or boot time kind of, if I'm not wrong. 

    But here, everything is perfect, even the heart beat of the server is perfect. But the CPU metric is not collected, likewise memory metric & volume metric is not being collected, even-though the server/agent is UP & running.

    Hope you got some idea of this complex situation. 

    Thanks in advance. 

  • We've all had this problem @solaiy, Steven's solution is a solid one for exactly what you're describing


  • Sorry for the late reply here. Think this uptime UTC is working. But still I need to do some deep dive testing on all metrics for a particular machine. I'll keep you posted on this. 

    Thanks much..!

    -Solaiy.