This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Detect a hung server, nonresponding state

Hello all,

     Looking for recommendation on how/what to monitor on Windows servers that they are in a healthy active state.  Last month after our Microsoft security patch weekend, we have a few servers that were in a hung/nonresponding state.  None of our SW alerts caught it because the servers replied to a ping and even responded to a RDP request.  TIA. 

Parents
  • I set up an alert/dashboard to let me know when we were not getting monitoring data (cpu/disk/etc).  Depends if you are doing agent or agentless.  I do agentless and query on Orion.Nodes.LastSystemUpTimePollUtc.

    This is from my dashboard (minus bits I flavor in from custom properties).  Alert works the same, just set to 'Custom SWQL' and 'node' (leave off select and from statements as the alert builder gives you no choices there).

    SELECT 
    Caption, 
    n.DetailsUrl as [_linkfor_Caption],
    '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Caption],
    ToLocal(N.LastSystemUpTimePollUtc) as LastSystemUpTimePollUtc,   
    ToLocal(NextPoll) as NextPoll, 
    N.Engine.DisplayName as PollingEngine, 
    
    FROM Orion.Nodes N
    
    WHERE
        N.UnManaged = 0 --node has not been unmanaged
        AND N.LastSystemUpTimePollUtc < ADDDATE('Minute', -30, GETUTCDATE())
    
    Order by LastSystemUpTimePollUtc DESC

  • I'm new to SW and this setup is over my head.  I'm using agent monitoring and was hoping there was an out of the box alert I could use.  

Reply Children