Recently, I was asked to have SolarWinds run a simple command on some of our Centos and Ubuntu Linux servers. Since we have SAM, and these are servers, I said that would be easy. Comparing the three big modules, NPM, SAM, & NCM, I'm certain I know the least of SAM. I could not find a simple and direct way to restart the dhcpd service on these servers, using SAM. For many years now, however, I have been using NCM to back up my .bash_history, and various other configs and files from some of our Linux boxes. Knowing this, I figured I'd just stick with what I know, and fix the problem by using NCM.
Since I had already been using NCM to backup things on these servers, I already had a working device template. Here are the contents of my generic Linux device template.
If you don't already have it, you would need to create, and then add the template to the server after choosing to manage the server via NCM.
(/Orion/CLI/Admin/DeviceTemplate/DeviceTemplatesManagement.aspx)
<!--SolarWinds Network Management Tools-->
<!--Copyright 2005 SolarWinds.Net All rights reserved-->
<Configuration-Management Device="00_Generic_Linux" SystemOID="1.3.6.1.4.1" SystemDescriptionRegex="" AutoDetectType="BySystemOid">
<Commands>
<Command Name="RESET" Value="${CRLF}" />
<Command Name="Reboot" Value="reboot${CRLF}y${CRLF}" />
<Command Name="EnterConfigMode" Value="sudo su -" />
<Command Name="ExitConfigMode" Value="exit" />
<Command Name="UseMultipleDownloadCommands" Value="True" />
<Command Name="Startup" Value="cat /etc/os-release" />
<Command Name="Running" Value="ls -hal /" />
<Command Name="Snmp" Value="cat /etc/snmp/snmpd.conf" />
<Command Name="NcmTestConfig" Value="tac ncmtestconfig" />
<Command Name="LinuxCommandHistory" Value="history" />
<Command Name="DownloadConfig" Value="${ConfigType}"/>
<Command Name="UploadConfig" Value="${EnterConfigMode}${CRLF}${ConfigText}${CRLF}${ExitConfigMode}" />
<Command Name="DownloadConfigIndirect" Value="${EnterConfigMode}backup fabric ${ConfigType} ${StorageAddress} ${StorageFilename}${CRLF}${CRLF}${CRLF}" />
<Command Name="EraseConfig" Value="reset saved-configuration" />
<Command Name="SaveConfig" Value="save" />
<Command Name="Version" Value="cat /etc/os-release" />
<Command Name="MenuBased" Value="false" />
<Command Name="VirtualPrompt" Value="\$ " />
<Command Name="UseVirtualPromptForCommands" Value="true" />
<!-- <Command Name="EnableIdentifier" Value="\#"/>-->
</Commands>
</Configuration-Management>
Next, you need to edit the server, and add it to NCM using your new device template.
Once your server has been successfully added to NCM, you can manage it via the execute NCM scripts tool/option.
For me, I needed a way to alert when the service has stopped, as well as run a simple command to restart the service. As I have more experience using NCM than I do SAM, I already knew this could easily be accomplished within the native alert/trigger action (Execute An NCM Action).
So, using NPM, SAM, & NCM together, I setup my new alert trigger to alert on the "Component". For this specific use, I felt it would be easier to simply hardcode everything into the alert, as it was for a very specific purpose and I wanted to make sure it ran correctly in the simplest form. Perhaps I'll go back into it later and revise it, though I doubt it since it's working perfectly.
So, we are now alerting on the component, which is watching the status of a specific process via SAM stats. Once it detects the service/process has stopped, the alert gets triggered, and we will need it to run our command.
To do this, we simply add "Execute An NCM Action" as a trigger action, tell it to "Execute Config Script", then give it the command to execute. In this case, we are restarting the dhcpd service.
While my priorities were to know about the issue (get the alert), and recover from the issue as quickly as possible (execute NCM script), I also wanted to know what was going on in between. For this, I simply added a 1 minute escalation, and included the ${Notes} variable for the node.
The notes variable is where NCM stores the command results, which will allow us to determine if the script actually ran successfully.
Here is what that part of the escalation email looks like, showing the command we ran, as well as the results.
Other than all the normal parts of the alert, that is pretty much all there is to it.
One of the best things about most SolarWinds products is you can easily bend them to your will. We just used a network config management tool to manage a server, and in my opinion, we did it much easier than we could have with SAM (remember, I'm dumb with SAM). This wasn't anything ground breaking, and you don't have to be a rocket surgeon to make it happen either. We just used the tools available to do what we needed to do. Again, even though there might be 10 different ways to do this via SAM, I have more experience with NCM, and felt comfortable solving my problem via this path.
Would you have solved this problem differently? If so, then what would you do differently, and how would you do it? Please share your thoughts and ideas on the subject.
Thank you,
-Will