How can I get Solarwinds Orion to generate an alert if SNMP stops responding on a host?
You can set an alert for any number of Node fields (e.g., name) and say Alert if empty.
That isn't working. I set the alert to alert if System Name is empty then stopped snmp service on my ESX host and I don't get any alerts.
Are test alert emails working manually?
Yes, I am able to receive a test alert email.
There are no alert supressions set either. Just a simple rule that says if the Node Name is empty alert. I tried other options too, like if the IP Address or SNMP Version or even CPU Load is empty, but I get nothing.
If I set it to say alert me if the Last boot is empty I get alerts from some of my network devices, but not the ESX host I have stopped SNMP on.
You can create an alert when an interface goes into an unknown status, but the node status is up. If you are monitoring the loopback on each node, you can make a standard alert on the interface type. That would prevent alerts out on other interface issues.
I finally figured this out. The solution was two particular steps that needed to be taken. Once I got it right I discovered several hosts that has SNMP halted.
First, in order to generate an alert for an application, I have to have that application added to the host. In my case, I am only monitoring one application on my ESX hosts which is an agent installed by our product. If necessary, I could have monitored the snmpd process itself to get the desired results.
Second, I needed to make an alert that says:
Application equal to snmpdStatus Descript is equal to Server is not responding to snmp
(I actually used our application called 'agent', but snmpd makes a better example)
I hate to do this, but I gotta give Solarwinds a hard time about this issue. In all my efforts to solve this problem, including using this forum and opening a ticket with support and doing an extensive GoToMeeting session with support, no one ever told me there was an application status of Server is not responding to snmp. The forums responses said try the 'is empty' evaluator. Support had me building this extensive and complicated APM solution that resulted in false alerts and uncovered a number of glaring bugs in the new APM solution.
In the end I discovered the application status description by chance when I ran across a host in the Application Monitor interface that had a wedged snmp process.
I have to also register some concerns about the plan to sunset the Application Monitor under the assumption that the new APM is a better solution. Although the APM surely has some promising functionality, the simplicity of the Application Monitor is also appealing. For example, with Application Monitor I don’t have to switch over to a clunky browser interface.
Both solutions need a way to easily manage the addition and removal of applications on a large number of hosts at once.
Richard, I think I know why the first solution did not work System Name = Empty.I believe when Orion polls the device with SNMP, it does not poll the System Name.The only time it would poll the System Name is when it re-discovers the device, so this would depend on the rediscover interval you have set in your System Manager settings.Using an interface of volume instead of system name may have given you the requested results, as these are polled with SNMP during a normal polling cycle.
In any case, it seems your solution is more elegant and gets down to the heart of the matter.
The reason I suggested monitoring an interface such as the loopback, is that every device typically has one. Monitoring whether a process is responding to SNMP only works for servers, and won't work for network devices.
in response to the last comment about managing the addition and removal of applications on a large number of hosts at once, I have to say APM is spot on for this.
If you use a template for monitoring then you can add a new process to be monitored to the template and this will push it out to all the devices that use the template.
APM was a huge step forward as far as this is concerned.