Hi,
I am looking for a way to get alerted when we have a physical hardware issue, whether it be a drive in a server, power supply, battery, memory, etc.
Does anyone have any good examples of how I might be able to do this?
Thanks.
First you need to make sure that your node's hardware is being polled:
To create an alert you can do so via web-based Alerting wizard:
Do you know if there is a way to just include Windows/Linux/VMware servers? We have a bunch of networking things on here that I am not concerned about and I am not sure how to filter out just servers?
To answer your second "Filter Question", find the mib for the server that lists the "Hardware Type" for the server(s). That means that SNMP needs to run on your servers to take advantage of polling. That will create a small amount of overhead for the servers, FYI. You can use SW "Tools" to help find the correct mib by walking the mib tree on the devices. Just set up your community name on the server and then in Tools to get that to work. (You will also need to set NPM and NCM up as well to get the information from polling.)
Set up polling for the servers.
1. Create a NEW alert for hardware problems for the servers. As an example, call it "Server Hardware Problems".
2. Inside the alert, set up the logic to say "If all of the following are true" for the first statement.
3. Then under that, set up a first global for "if ANY of the following are true" as the set of logical "or" statements. Example: The Field "hardware type" is equal to "Windows".
The Field "hardware type is equal to "linux"
etc.
4. A THIRD logical global statement should be created Under the FIRST one. It will be indented just like the second one that we just did. In that statement it should say "where any of the following are true".
5. Under that third global statement, create all of the hardware components that you want to watch.
The logic example is thus: If this is a Windows machine AND it has a hardware problem with (name your physical item here), the alert will trigger.
6. The message part of the alert should have (variables) for the Node name, the hardware type, the component name and a timestamp if wanted. You will have to use the listing from tools to determine what the devices may be called.
With some manipulation, it should be what you are looking for.
hope that this helps, Mark
Ok, I think I have it down pretty well. One other problem I have noticed is we have about 1000 nodes and some are physical servers other are virtual. I noticed on some of the physical servers the "Hardware Sensor" check box was not checked off and I had to do it manually.
Does anyone know of a custom search so that I can just list Physical Servers and see which ones don't have the Hardware Sensor checked?
All of our servers (well most) are polling by SNMP
Thanks
Very easy my friend:
Option (1) - Filter by OS type within alert:
Option (2) - Create custom property for filtering
This is great question, thanks.
It was supposed to be pretty straightforward, but it turned out that it is not that clear how to differentiate between Physical/Virtual machines, even though Node Details resource shows this information. There are many questions about it on Thwack and not many definitive answers
Here is exact same thing that you are looking for: Reporting nodes not configured for hardware polling?
Hi Alex,
I have created the an alert in below format but it is not working.
Actually esxi have SD disk which is going down and and we are getting the below event in node,
so i start to create an alert but not get success, please let us know so that i can create the same alert.
Thanks in Advance.
I would suggest to start from scratch. First - remove all filters and all limitations and only leave "I want to alert on Hardware Sensor". See what is going to trigger. Then, start filtering out once you know it works.
Another thing I notice - you have limited scope in your rule above. I use this functionality very rarely. Most of the things can be done on filter level. So - switch back to "all object in my environment" and simply create filter for the node nae, that you have blanked out above
something like this:
I used the alert trigger condition-
But i think the condition would not giving the correct result.
Krishna
ok, getting closer... remove this condition now, leave it blank so that it captures everything, and see if you alert will trigger
when i put last column, which is working and showing the lots of event for h/w sensor.
But the point is i would to capture only perticuller h/w sensor event - like "hardware sensor critical for usb direct access"
i think filter is not working.
OK, you see - we are getting there So, we have now confirmed that you alert is working and you have also concluded that without filer it works (albeit too much noise) and with filer it doesn't - hence, the problem is filter
I would suggest you the following:
* Go to your Reports and create a new report
* Make it to report on "Hardware Sensor"
* Then, add a bunch of columns you like, including Hardware sensor message (as per your above screenshot)
* Do not add any filters
* Run a report and see if you get a result
Often you think that logically your message should be called "Message", but it is not (a good example is a Node name - in SolarWinds world the name that you see on top of node's summary page is actually called "Caption" in database). By playing with report you can figure out the exact column names you can filter by. Then, go back to your alert and use your findings to adjust filter accordingly. This is the process I follow when I need to report/alert on something new that I haven't done before. Running a report helps to see what sort of data is stored in what columns in database.
Another thing I have just noticed - you are saying:
Events are different then objects (Herd ware Sensor in this case)
Try this:
* Instead of searching for "Message" add these two filters:
1. Node Node Name is equal to [whatever you server name is]
2. Hardware Sensor status is equal to Critical
P.S. I have noticed that "Caption" doesn't come up in the list of possible fields for the Node to choose from... strange.... but I think "Node Name" will do in this case. Try the above
And you can also add name of the sensor to limit it to a particular exact sensor that you need
Now, back to "Message" - I think this is what you have (as mentioned above):
1. You see your event in logs. Right?
2. You then create alert based on "Hardware Sensor"
3. Both Hardware Sensor and Event objects (which are different as previously mentioned) have "Message" field apparently (I have just confirmed this in my environment)
4. What you are trying to do is to find a message of the event in hardware sensor table... obviously it is not there
So, as it stands now you cannot alert on just about any events (only on Auditing Events). Therefore, your only option will be to use filter as proposed above for your Hardware Sensor Status + Sensor Name, Node Name... etc to limit scope further the way you like
below condition is working fine, but we don't multiple h/w sensor alert,
but when i am applying the filter, then it is not working, could you please find a way to so that filter could work.