You Need Configuration Management. Really.

It sounds obvious, perhaps, but without configurations, our network, compute, and storage environments won't do very much for us. Configurations develop over time as we add new equipment, change architectures, improve our standards, and deploy new technologies. The sum of knowledge within a given configuration is quite high. Despite that, many companies still don't have any kind of configuration management in place, so in this article, I will outline some reasons why configuration management is a must, and look at a some of the benefits that come with having it.

Recovery from total loss

As THWACK users, I think we're all pretty technically savvy, yet if I were to ask right now if you had an up-to-date backup of your computer and its critical data, what would the answer be? If your laptop's hard drive died right now, how much data would be lost after you replaced it?

Our infrastructure devices are no different. Every now and then a device will die without warning, and the replacement hardware will need to have the same configuration that the (now dead) old device had. Where's that configuration coming from?

Total loss is perhaps the most obvious reason to have a system of configuration backups in place. Configuration management is an insurance policy against the worst eventuality, and it's something we should all have in place. Potential ways to achieve this include:

At a minimum, having the current configuration safely stored on another system is of value. Some related thoughts on this:

  • Make sure you can get to the backup system when a device has failed.
  • Back up / mirror / help ensure redundancy of your backup system.
  • If "rolling your own scripts," make sure that, say, a failed login attempt doesn't overwrite a valid configuration file (he said, speaking from experience). In other words, some basic validation is required to make sure that the script output is actually a configuration file and not an error message.

Archives

Better than a copy of the current configurations, a configuration archive tracks all -- or some number of -- the previous configurations for a device.

An archive gives us the ability to see what changes occurred to the configuration and when. If a device doesn't support configuration rollback natively, it may be possible to create a kind of rollback script based on the difference between the two latest configurations. If the configuration management tool (or other systems) can react to SNMP traps to indicate a configuration change, the archive can be kept very current by triggering a grab of the configuration as soon as a change is noted.

Further, home-grown scripts or configuration management products can easily identify device changes and generate notifications and alerts when changes occur. This can provide an early warning of unauthorized configurations or changes made outside scheduled maintenance windows.

Compliance / Audit

Internal Memo

We need confirmation that all your devices are sending their syslogs to these seventeen IP addresses.

-- love from, Your Friendly Internal Security Group xxx

"Putting the 'no' in Innovation since 2003"

A request like this can be approached in a couple of different ways. Without configuration management, it's necessary to log in to each device and check the syslog server configuration. With a collection of stored configurations, however, checking this becomes a matter of processing configurations files. Even grepping them could extract the necessary information. I've written my own tools to do the same thing, using configuration templates to allow support for the varying configuration stanzas used by different flavors of vendor and OS to achieve the same thing.

Some tools — Solarwinds NCM is one of them — can also compare the latest configuration against a configuration snippet and report back on compliance. This kind of capability makes configuration audits extremely simple.

Even without a security group making requests, the ability to audit configurations against defined standards is an important capability to have. Having discussed the importance of configuration consistency, it seems like a no-brainer to want a tool of some sort to help ensure that the carefully crafted standards have been applied everywhere.

Pushing configuration to devices

I'm never quite sure whether the ability to issue configuration commands to devices falls under automation or configuration management, but I'll mention it briefly here since NCM includes this capability. I believe I've said in a previous Geek Speak post that it's abstractions that are most useful to most of us. I don't want to write the code to log into a device and deal with all the different prompts and error conditions. Instead, I'd much rather hand off to a tool that somebody else wrote and say, Send this. Lemme know how it goes. If you have the ability to do that and you aren't the one who has to support it, take that as a win. And while you're enjoying the golden trophy, give some consideration to my next point.

Where is your one true configuration source?

Why do we fall into the trap of using hardware devices as the definitive source of each configuration? Bearing in mind that most of us claim that we're working toward building a software-defined network of some sort, it does seem odd that the configuration sits on the device. Why does it not sit in a database or other managed repository that has been programmed based on the latest approved configuration in that repo?

Picture this for example:

  • Configurations are stored in a git repo
  • Network engineers fork the repo so they have a local copy
  • When a change is required, the engineer makes the necessary changes to their fork, then issues a pull request back to the main repo.
  • Pull requests can be reviewed as part of the Change Control process, and if approved, the pull-request is accepted and merged into the configuration.
  • The repo update triggers the changes to be propagated to the end device

Such a process would give us a configuration archive with a complete (and commented) audit trail for each change made. Additionally, if the device fails, the latest configuration is in the git repo, not on the device, so by definition, it's available for use when setting up the replacement device. If you're really on the ball, it may be possible to do some form of integration testing/syntax validation of the change prior to accepting the pull request.

There are some gotchas with this, not the least of which is that going from a configuration diff to something you can safely deploy on a device may not be as straightforward as it first appears. That said, thanks to commands like Junos' load replace and load override and IOS XR's commit replace, such things are made a little easier.

The point of this is not really to get into the implementation details, but more to raise the question of how we think about network device configurations in particular. Compute teams get it; using tools like Puppet and Chef to build and maintain the state of a server OS, it's possible to rebuild an identical server. The same applies to building images in Docker. The configuration should not be within the image becuase it's housed in the Dockerfile. So why not network devices, too? I'm sure you'll tell me, and I welcome it.

Get. Configuration. Management. Don't risk being the person everybody feels pity for after their hard drive crashes.

Thwack - Symbolize TM, R, and C