Detect a hung server, nonresponding state

Question

Hello all, 
 Looking for recommendation on how/what to monitor on Windows servers that they are in a healthy active state. Last month after our Microsoft security patch weekend, we have a few servers that were in a hung/nonresponding state. None of our SW alerts caught it because the servers replied to a ping and even responded to a RDP request. TIA.

monitoringlife · Answer

I set up an alert/dashboard to let me know when we were not getting monitoring data (cpu/disk/etc). Depends if you are doing agent or agentless. I do agentless and query on Orion.Nodes.LastSystemUpTimePollUtc. 
 
 This is from my dashboard (minus bits I flavor in from custom properties). Alert works the same, just set to 'Custom SWQL' and 'node' (leave off select and from statements as the alert builder gives you no choices there). 
 SELECT 
Caption, 
n.DetailsUrl as [_linkfor_Caption],
'/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Caption],
ToLocal(N.LastSystemUpTimePollUtc) as LastSystemUpTimePollUtc, 
ToLocal(NextPoll) as NextPoll, 
N.Engine.DisplayName as PollingEngine, 

FROM Orion.Nodes N

WHERE
 N.UnManaged = 0 --node has not been unmanaged
 AND N.LastSystemUpTimePollUtc < ADDDATE('Minute', -30, GETUTCDATE())

Order by LastSystemUpTimePollUtc DESC

Detect a hung server, nonresponding state

Top Replies