Orion seems to be able to read the temperature if the machine supports it, so is it possible to take it one step further and issue a shutdown command when the server reaches a certain threshold?
With a little work, it should be possible. You don't mention what the target OS is, so I'm going to assume Windows for now, but if it's another platform, it shouldn't be too hard to figure out what I did, and adjust from there. The way I'd do this is to create an alert in "Advanced Alert Manager", with a trigger condition that looked something like this:
Property to Monitor: Hardware Type
- Hardware Type Name is equal to Temperature
- Hardware Type Status is equal to Critical
You might need to add other conditions, such as getting it to exclude certain hardware depending on known criteria. Once you have identified your criteria, next is the trigger action. Click on the "Add New Action", and select "Execute an external program", this is where you can get it to do all shutdown steps. Because I'm using Windows as an example, I'd use psshutdown to do the work. Assuming I put psshutdown in c:\utils\ then the command would look something like this:
c:\utils\psshutdown -f -k -t 30 -m "Shutdown due to Temperature issues" \\${DNS}
Using psshutdown, you may need to specify credentials, in which case, you'd either adjust the command here, or look into triggering the action some other way.
As a side note, if you want it to react at a certain temperature, rather than the hardware telling you it's at a dangerous temperature, change the property to monitor to Hardware Sensor, remove the "hardware type status" option, and use "Hardware Sensor Value".
I'd strongly recommend using a dummy script for a while instead of the real shut down script, just to verify your alerting criteria are in fact correct, and you don't go shutting down your infrastructure by accident. You should also consider setting it so that the alert trigger doesn't go off until X minutes have passed, this is to avoid fluctuating temperatures or values. You can also use a reset action to stop the shutdown, for example with psshutdown you pass in -a and it'll abort the shutdown.
This is a rough idea, I've not tested it, but should give you some ideas to work from. Have fun! Let me know how it goes.
Additionally, most current hardware vendors that have temperature sensors also have some form of a "service processor" that sits in the background and independently monitors the sensors.
The service processor will have its own thresholds for a temperature alarm ( SNMP trap / Audio Alarm / Log entry / etc).
There are usually warning and critical thresholds that have been preset by the HW vendor before shipping.
When the temperature critical threshold is exceeded, the service processor will perform a HARD shutdown of the system to protect the hardware. Their presumption is by the time the temperature has gone critical, the HW is more important to protect than the SW running on the system.
If you plan to initiate an orderly shutdown, ensure your shutdown thresholds are below the service processor thresholds.
Chris.
Thanks for the response, this looks like it's exactly what I was looking for! I'll play around with various conditions and get it perfected.
The script works great locally, but doesn't want to execute remotely. Any ideas? It doesn't even act like it's trying to do anything on the remote test machine.
This is likely due to limited permissions the local system account has for accessing network resources. Unfortunately the Advanced Alert Manager does not allow impersonation when executing external programs or scripts. Instead you need to change the user context under which the Advanced Alert Manager runs to a domain account that has permissions to both login locally to the system, as well as access the network resources that the script is dependent upon. Once you've changed the account you will need to restart the SolarWinds Alerting Engine service for the changes to take effect. Then your script should execute normally.
I did this and it appears that the remote machine still does nothing. Where do I start looking to troubleshoot this and see if the script is even being sent to the remote machine?
Are you using psshutdown as I gave in the example above? Does the user you have running the alert manager have admin access on the remote server? It will be needed to access \\servername\admin$\, you should verify that it has access. The other thing you may need to verify is that it is using the right values, instead of executing psshutdown like above, just call a .bat script and write the values to a text file, something like this:
echo %1 >> c:\temp\script_out.txt
Name it c:\util\test.bat, and for your script execution command, do something like this:
c:\util\test.bat ${DNS}
This should log the DNS address into the text file, verify it is what you are expecting it to be.
Check the windows event logs on the remote server, psshutdown installs a service to do the actual shutdown. If psshutdown isn't working for you, you could use the regular shutdown command, or even some powershell.