Node is Critical - But Why??

I have a question/concern about navigating through the Orion alerts.  I'm probably missing something vary obvious, but I can't see it.  

I have a number of nodes (generally VM's) and a couple of groups (related to esxi hosts) that show as "critical".  However, when I drill down to the nodes themselves, there is nothing that pops out that says "this is why the node is critical."  If the CPU load and memory are within range, nothing in the default summary views pulls out what is triggering the alert.  Is it disk space?  Temp?  Hardware?  Nothing I click on or drill into seems to display the status of all monitored elements of a particular node.  

So, am I missing something or expecting too much?

Thanks,
Joe

  • Anyone?  I feel like this is a stupid question but I'm not seeing the answer.

  • Thanks for the response on this.  I added your widget to the standard Nod Details page, and clicking on the link in your widget just brings me back to that same page.  I was hoping for something that highlighted "CPU temp" or "Disk Utilization" or similar that would bring to the top what the trigger was.

  • Hi Joe,

    My knee-jerk response based on your explanation is your experiencing status rollup.  In which case SolarWinds will show the worst case status for a group.  If you haven't done so already review rollup status here. Status Rollup Mode in the SolarWinds Platform

  • Bourlis, thanks for the reply.  This makes a bit of sense from what I'm seeing in the summary views.  I guess my original question remains though: how do I create a view of a node that shows ALL alerts for the node?  Or perhaps a view that shows all monitored devices/resources for that node and the current status.  That would provide a quick view of the node without having to drill down multiple levels.  Any thoughts on that?

  • Joe,

    I'm don't quite understand your question(s) or how to correctly answer them.  For all "X" details view there should be a widget that says, 'All Alerts this object can trigger' and 'Active Alerts'.  If you don't see that widget, I would highly suggest adding it.  That's one of the first things I show people when they I introduce them to SolarWinds.

    Here's an example of the widgets

    For all monitored resources you can include the AppStack Environment widget for the node.

    I have a sidebar tab for all my custom node views that's dedicated to just high-level monitoring and alerting.  Node owners and application teams can click on that sidebar tab to get a simple overview of if the server is setup for alerting, if so who gets the alerts based on our alert naming standards and then they can easily see what's down in a semi detailed view and in a high-level view.

    If this is what you're looking for then it's already there, you just need to enable it.  

    But if I completely missed the boat, then please explain.

  • Bourlis, thanks for continuing the conversation.  I'll try to explain my predicament a bit clearer, and hopefully you can tell me what I'm missing.

    This morning there was an alert that a "Node is Down": 

    So I click on the object that triggered this alert, which takes me to the Node Details page.  Under 'Root Cause of Alert' it shows the status as 'Critical'. 

    I again click on the server name, which just brings me back to the same Node Details page.  On this page there is also the 'AppStack Environment for Servername' widget:

    Nothing in this widget shows the element that is causing the alert.  There is also the "Active Alerts on This Node" widget:

    As you can see, none of these views or widgets tells me WHY the node is down or is shown as critical.  Is the C: drive full?  Is there a hardware failure?  I'm not seeing anything I can take action on.  

    For some background, I ran the original Ncentral monitoring system from Nable for years before they were acquired by SolarWinds.  In the older version it was easy to see a list of all objects monitored for a given node, and immediately see the status of each object.  Maybe I'm expecting too much, or more likely the information is there but I'm not seeing it.

    Appreciate your thoughts.
    Joe

  • If you are being looped back to the same node details page then the root cause is likely to be virtual machine related.
    Have a look at the Virtual Machine Details widget in your Node Details view. The guest status is probably Warning or Critical.

  • Deltona,

    The Virtual Machine Details widget does not provide any more information that the others.  I have confirmed that the local resources such as physical and virtual memory, local drives, etc. are selected.  But nothing like that is listed.

  • The guest status is VMware specific. You will need to navigate to your vSphere host or vCenter server for additional details concerning the Critical guest status.