This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Detect a hung server, nonresponding state

Hello all,

     Looking for recommendation on how/what to monitor on Windows servers that they are in a healthy active state.  Last month after our Microsoft security patch weekend, we have a few servers that were in a hung/nonresponding state.  None of our SW alerts caught it because the servers replied to a ping and even responded to a RDP request.  TIA. 

Parents
  • I set up an alert/dashboard to let me know when we were not getting monitoring data (cpu/disk/etc).  Depends if you are doing agent or agentless.  I do agentless and query on Orion.Nodes.LastSystemUpTimePollUtc.

    This is from my dashboard (minus bits I flavor in from custom properties).  Alert works the same, just set to 'Custom SWQL' and 'node' (leave off select and from statements as the alert builder gives you no choices there).

    SELECT 
    Caption, 
    n.DetailsUrl as [_linkfor_Caption],
    '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Caption],
    ToLocal(N.LastSystemUpTimePollUtc) as LastSystemUpTimePollUtc,   
    ToLocal(NextPoll) as NextPoll, 
    N.Engine.DisplayName as PollingEngine, 
    
    FROM Orion.Nodes N
    
    WHERE
        N.UnManaged = 0 --node has not been unmanaged
        AND N.LastSystemUpTimePollUtc < ADDDATE('Minute', -30, GETUTCDATE())
    
    Order by LastSystemUpTimePollUtc DESC

  • I'm new to SW and this setup is over my head.  I'm using agent monitoring and was hoping there was an out of the box alert I could use.  

  • you got this!

    Sure someone smarter has something.  If not here is how to set it up.
    just copy/paste the WHERE clause into a 'custom SQWL alert' and see if it works for you.

    Alert logic

    WHERE
        Nodes.UnManaged = 0 --Node has not been unmanaged
        AND Nodes.Status != '2'  --Node is not down.
        AND Nodes.LastSystemUpTimePollUtc < ADDDATE('Minute', -60, GETUTCDATE())

    ---

    Screenshot

Reply
  • you got this!

    Sure someone smarter has something.  If not here is how to set it up.
    just copy/paste the WHERE clause into a 'custom SQWL alert' and see if it works for you.

    Alert logic

    WHERE
        Nodes.UnManaged = 0 --Node has not been unmanaged
        AND Nodes.Status != '2'  --Node is not down.
        AND Nodes.LastSystemUpTimePollUtc < ADDDATE('Minute', -60, GETUTCDATE())

    ---

    Screenshot

Children