Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

Looking for a way to be alerted when WMI fails to poll information on a Node

bsciencefiction.tv

Any thoughts on a report or alert I could use for a way to be alerted when WMI fails to poll information on a Node

Find more posts tagged with

Accepted answers

aLTeReGo

Volume status = "Unknown" is a good one

All comments

aLTeReGo

Volume status = "Unknown" is a good one

Detroiter

Could try to modify the built in "Alert me when a managed node has not been polled during the last 5 tries" by adding a criteria where Object Type = WMI.

bsciencefiction.tv

aLTeReGo for the win

Select Distinct (Nodes.Caption)

From Nodes

Inner Join Volumes

On Nodes.NodeID = Volumes.NodeID

Where Volumes.Status = 0

And Nodes.ObjectSubType = 'WMI'

And Nodes.Status = 1

sacosgrove

I've been using the volume status = unknown in order to determine WMI issues per aLTeReGo's advice. It performs its function, however I get a separate alert for each volume on a node. For some servers there are 6-8 different volumes resulting in 6-8 separate emails being generated.

Any other ideas??

aLTeReGo

Have you considered adding an additional condition to the trigger only to alert on the 'C:\' drive?

sacosgrove

I have now!!!

sacosgrove

Oh, and also:

shuth

Hmm, I wonder if something similar can be used for my problem here: Alert on agents not responding

The problem is that volumes might poll OK and CPU/memory doesn't which means the volume unknown method won't work. It's a bit flaky.

sacosgrove

aLTeReGo We can't figure out why yet, but we are having several various Windows server's (no consistency) stop responding. No blue screen, cannot RDP, but system responds to ping tests - suggesting it's more in the application level of the network stack. Obviously I can't use any WMI or PowerShell scripting. Using the command prompt and SHUTDOWN /r, etc. will not work either. Until we can determine the root cause, we've just been rebooting the affected servers in order to restore service. In order to reboot, we have to use iLo or reset the VM in Vsphere.

What are your thoughts on how to automate a reboot action when this occurs?

1.) Physical - using iLo

2.) Virtual - using VMAN

Because we are triggering on volume status, when I try to use "Manage VM - Power off" (BTW - I think it's odd there is no "Manage VM - Reset" action), I am required to select a specific VM. If I switch the trigger to virtual machine, I can perform the action on the offending node, but I can't determine on what I would trigger.

aLTeReGo

In that scenario, I would recommend using the Agent on those machines. When the server locks up, the node status will then change to accurately reflect a 'down' status. This will then allow you to automate the action against that node using the 'Manage VM - Reboot' action. As for the iLO, there are methods of performing this in a scripted fashion, but you would need some consistent and predictable method of referencing them to make one script suitable for all devices. In my previous environments I gave DNS names of the iLO and Dell DRAC's the server name it's associated with prepended with the type of out-of-band management card it was running. E.G. 'ilo-serversame.domain.ext' or 'drac-otherserver.domain.ext'. I then created a CNAME alias so I didn't need to remember which devices were HP and which were Dell. That CNAME was 'oob.servername.domain.ext'. This would allow a fairly easy to remember, predictable format suitable for scripting.