This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Using NCM To Manage Our Linux Servers

wluther over 5 years ago

Recently, I was asked to have SolarWinds run a simple command on some of our Centos and Ubuntu Linux servers. Since we have SAM, and these are servers, I said that would be easy. Comparing the three big modules, NPM, SAM, & NCM, I'm certain I know the least of SAM. I could not find a simple and direct way to restart the dhcpd service on these servers, using SAM. For many years now, however, I have been using NCM to back up my .bash_history, and various other configs and files from some of our Linux boxes. Knowing this, I figured I'd just stick with what I know, and fix the problem by using NCM.

Since I had already been using NCM to backup things on these servers, I already had a working device template. Here are the contents of my generic Linux device template.

If you don't already have it, you would need to create, and then add the template to the server after choosing to manage the server via NCM.

(/Orion/CLI/Admin/DeviceTemplate/DeviceTemplatesManagement.aspx)

<!--SolarWinds Network Management Tools-->
<!--Copyright 2005 SolarWinds.Net All rights reserved-->
<Configuration-Management Device="00_Generic_Linux" SystemOID="1.3.6.1.4.1" SystemDescriptionRegex="" AutoDetectType="BySystemOid">
  <Commands>
    <Command Name="RESET" Value="${CRLF}" />
    <Command Name="Reboot" Value="reboot${CRLF}y${CRLF}" />
    <Command Name="EnterConfigMode" Value="sudo su -" />
    <Command Name="ExitConfigMode" Value="exit" />
    <Command Name="UseMultipleDownloadCommands" Value="True" />
    <Command Name="Startup" Value="cat /etc/os-release" />
    <Command Name="Running" Value="ls -hal /" />
    <Command Name="Snmp" Value="cat /etc/snmp/snmpd.conf" />
    <Command Name="NcmTestConfig" Value="tac ncmtestconfig" />
    <Command Name="LinuxCommandHistory" Value="history" />
    <Command Name="DownloadConfig" Value="${ConfigType}"/>
    <Command Name="UploadConfig" Value="${EnterConfigMode}${CRLF}${ConfigText}${CRLF}${ExitConfigMode}" />
    <Command Name="DownloadConfigIndirect" Value="${EnterConfigMode}backup fabric ${ConfigType} ${StorageAddress} ${StorageFilename}${CRLF}${CRLF}${CRLF}" />
    <Command Name="EraseConfig" Value="reset saved-configuration" />
    <Command Name="SaveConfig" Value="save" />
    <Command Name="Version" Value="cat /etc/os-release" />
    <Command Name="MenuBased" Value="false" />
    <Command Name="VirtualPrompt" Value="\$ " />
    <Command Name="UseVirtualPromptForCommands" Value="true" />
    <!-- <Command Name="EnableIdentifier" Value="\#"/>-->
  </Commands>
</Configuration-Management>

Next, you need to edit the server, and add it to NCM using your new device template.

Once your server has been successfully added to NCM, you can manage it via the execute NCM scripts tool/option.

For me, I needed a way to alert when the service has stopped, as well as run a simple command to restart the service. As I have more experience using NCM than I do SAM, I already knew this could easily be accomplished within the native alert/trigger action (Execute An NCM Action).

So, using NPM, SAM, & NCM together, I setup my new alert trigger to alert on the "Component". For this specific use, I felt it would be easier to simply hardcode everything into the alert, as it was for a very specific purpose and I wanted to make sure it ran correctly in the simplest form. Perhaps I'll go back into it later and revise it, though I doubt it since it's working perfectly.

So, we are now alerting on the component, which is watching the status of a specific process via SAM stats. Once it detects the service/process has stopped, the alert gets triggered, and we will need it to run our command.

To do this, we simply add "Execute An NCM Action" as a trigger action, tell it to "Execute Config Script", then give it the command to execute. In this case, we are restarting the dhcpd service.

While my priorities were to know about the issue (get the alert), and recover from the issue as quickly as possible (execute NCM script), I also wanted to know what was going on in between. For this, I simply added a 1 minute escalation, and included the ${Notes} variable for the node.

The notes variable is where NCM stores the command results, which will allow us to determine if the script actually ran successfully.

Here is what that part of the escalation email looks like, showing the command we ran, as well as the results.

Other than all the normal parts of the alert, that is pretty much all there is to it.

One of the best things about most SolarWinds products is you can easily bend them to your will. We just used a network config management tool to manage a server, and in my opinion, we did it much easier than we could have with SAM (remember, I'm dumb with SAM). This wasn't anything ground breaking, and you don't have to be a rocket surgeon to make it happen either. We just used the tools available to do what we needed to do. Again, even though there might be 10 different ways to do this via SAM, I have more experience with NCM, and felt comfortable solving my problem via this path.

Would you have solved this problem differently? If so, then what would you do differently, and how would you do it? Please share your thoughts and ideas on the subject.

Thank you,

-Will

Top Replies

0 kmaxwell over 5 years ago

That's a really creative use of NCM wluther! You're absolutely right about the platform being "bendable". My experience with SolarWinds started off with SAM many moons ago, so I used that module to do the same sort of thing. If I remember correctly, I used a script based component that included a restart command for the process that was down which would also return a statistic and message output value that I could alert on to pass along the pertinent details. Like you said tho, there are a ton of ways to do the same sorts of things. If I were to tackle the same issue today I'd most certainly go with a more efficient method, but its what made the most sense to me way back then
I really look at two things when deciding how to approach a particular monitoring question - efficiency and sustainability. Efficiency is important because we don't want to put undue overhead burdens on a monitored element since that would be counterproductive, the mantra 'do no harm' comes to mind. Additionally, we need to consider the burden each method places on the polling engine. Sustainability is important because at the end of the day you have to maintain these solutions, and more importantly, you may have to hand off the platform to someone else down the road.
Great topic!
-Kris
Cancel
Vote Up +1 Vote Down

Cancel
0 mesverrum over 5 years ago

I'll admit, i love complexity haha, I spent an afternoon with a Linux admin a few weeks ago getting openssh configured on our orion server and then leveraging powershell and openssh that to be able to ssh into their linux machines with a cert instead of user/pass combo and then executing whatever scripts we wanted to cook up.
Your solution looks a lot more simple to deploy if the key based login thing isn't a dealbreaker.
I also never knew that the running NCM scripts in an alert would automatically loaded the response into the alert notes. That could be extremely handy.
Cancel
Vote Up +1 Vote Down

Cancel
0 sja over 5 years ago

Excellent work around wluther
I honestly think that NCM is great tool ...
When we get that feature NCM will become network automation game changer
Cancel
Vote Up +1 Vote Down

Cancel