Infrastructure as Code

Infrastructure as Code is about the programmatic management of IT infrastructure, be it physical or virtual servers, network devices, or the variety of supporting appliances racked and stacked in our network closets and data centers.

Historically, the problem has been managing an increasing number of servers after the advent of virtualization. Spinning up new virtual servers became a matter of minutes creating issues like server sprawl and configuration drift. And today, the same problem is being recognized in networking as well.

Languages such as Python, Ruby, and Go, along with configuration management platforms such as Puppet and Chef are used to treat servers as pools of resources called on in configuration files. In networking, engineers are taking advantage of traditional programming languages, but Salt and Ansible have emerged as dominant frameworks to manage large numbers of network devices.

What problems does treating infrastructure as code solve?

First, providing there’s enough storage capacity, we’re able to spin up new servers so quickly and easily that many organizations struggle with server sprawl. In other words, when there’s a new application to deploy, we spin up a new virtual machine. Soon, server admins, even in medium-sized organizations, found themselves dealing with an unmanageably large number of VMs.

Second, though using proprietary or open tools to automate server deployment and management is great for provisioning (configuration management), it doesn’t provide a clear method for continuous delivery including validation, automated testing, or version control. Configurations in the environment can drift from the standard with little ability to track, repair, or roll back.

Treating the infrastructure as code means adopting the same software development tools and practices that developers use to make provisioning infrastructure services more efficient, to manage large numbers of devices in pools of resources, and to provide a methodology to test and validate configuration.

What tools can we use to manage infrastructure the same way developers manage code?

The answer to this question is, well, it depends. The tools we can use depend on what sort of devices we have in our infrastructure and the knowledge of SysAdmins, network admins, and application developers. The good news is that Chef, Puppet, Salt, and Ansible are all relatively easy enough to learn that infrastructure engineers of all stripes can quickly cross the divide into the configuration management part of DevOps.

Having a working knowledge of Python or Ruby wouldn’t hurt, either.

Configuration Management

Chef and Puppet are open source configuration management platforms designed to make managing large numbers of servers very easy. They both enable a SysAdmin to pull information and push configuration files to a variety of platforms, making both Chef and Puppet very popular.

Since both are open source, an infrastructure engineer just starting out might find the maze of plugins, and the debate within the community about which is better, a little bit confusing. But the reality is that it’s not too difficult to go from the ground floor to managing an infrastructure.

Both Puppet and Chef are agent-based, using a master server with agents installed directly on nodes. Both are used very effectively to manage a large variety of platforms, including Windows and VMware. Both are written in Ruby, and both scale very well.

The differences between the two are generally based on elements that I don’t feel are that relevant. Some believe that Chef lends itself more to developers, whereas Puppet lends itself more to SysAdmins. The thing is, though, both have a very large customer base and effectively solve the same problems. In a typical organization of reasonable size, you’ll likely have a mix of platforms to manage, including VMware, Windows, and Linux. Both Chef and Puppet are excellent components to a DevOps culture that's managing these platforms.

Ansible and Salt are both agent-less languages built on Python. And though they offer some support for Windows, Ansible and Salt are more geared for managing Linux and Unix-based systems, including network devices.

Continuous Delivery

Continuous Delivery is about keeping configurations consistent across the entire IT environment, reducing the potential “blast radius” with appropriate testing, and reducing time to deploy new configurations. This is very important because using Chef or Puppet alone stops at automation and doesn’t apply the DevOps practices that provide all the benefits of Infrastructure as Code.

Remember that Infrastructure as Code is as much a cultural shift as it is the adoption of technology.

The most common tools are Travis CI and Jenkins. Travis CI is a hosted solution that runs directly off GitHub, while Jenkins runs off a local server. Both have GitHub repositories. Some people like having total control using a local Jenkins server, while others prefer the ease of use of a hosted solution like Travis.

To me, it’s not all that important which one a team uses. A SysAdmin adopting an Infrastructure as Code model will benefit either way. Integrating one or the other into a team’s workflow will provide tremendously better continuous delivery than simple peer review and ad hoc lab environments.

Version Control

Version control is one component that, in my experience, infrastructure engineers instantly see the value in. In fact, in the IT departments in which I’ve worked, everyone seems to have some sort of cobbled together version control system (VCS) to keep track of changes to the many configuration files we have.

Infrastructure as Code formalizes this with both software and consistent practice. Rather than storing configurations in a dozen locations, likely each team member’s local computer, a version control system centralizes everything.

That’s the easy part, though. We can do that with a department share. What a good version control system does, however, is provide a constant flow of changes back to the source code, revision history, branch management, and even change traceability.

Git is probably the most common VCS along with GutHub and BitBucket, but just like the continuous delivery solutions I mentioned, it’s more about just doing something. Using any of these VCSs even minimally is light years ahead of a network share and file names with “FINAL” or CURRENT” at the end.

Culture

When it comes down to it, though, Infrastructure as Code is just as much about culture as it is about technology. It’s a paradigm shift – a shift in practice and methodology – in addition to the adoption of programming languages, tools, and management platforms.

An IT department will see absolutely zero benefits from standing up a Jenkins server if it isn’t integrated into the workflow and actually used. Getting buy-in from the team is extremely important. This is no easy task, because now you’re dealing with people instead of bits and bytes, and people are much more complex that even our most sophisticated systems.

One way to get there is to start with only version control with GitHub or some other VCS. Since this is completely non-intrusive and has zero impact on production, buy-in from the team and from leadership is much easier.

Another idea I’ve seen work in practice is to start with only one part of the infrastructure. This could mean starting with all the Linux machines or all the Windows machines and managing them with Chef. In this way, a SysAdmin can manage a group of like machines and see tangible benefits of Infrastructure as Code very quickly without having to get buy-in across all teams.

As the benefits become more apparent, either to colleagues working in the trenches or leadership looking for some proof of concept, the culture can be changed from the bottom up or top down.

Making something old into something new

Remember that Infrastructure as Code is a cultural shift as much as it is an adoption of technology. Developers have been using these same tools and same practices for years to develop, maintain, and deploy code, so this a is a time-tested paradigm that SysAdmins and network admins can adopt very quickly.

In time, Infrastructure as Code can help us become proactive managers of our infrastructure, making it work for us rather than making us work for the infrastructure.

  • I agree that this is like most other things, the challenge is not the technology, it's the culture change and getting buy-in.  So often these types of things are attempted to be rolled out as somebody's pet or silo project without consideration for the larger picture.  Make sure to work with your entire operations team when doing something like this as their input and understanding of how things works is critical to the success.

  • Nice way of looking at things. It seems that everyone talks about "change control" but it's difficult to get everyone on the same page. The idea of presenting it as you would code is helpful. Let's people think of server/network/hardware/etc. changes as having revisions and makes it easier to track them over time. So often when change control is implemented there isn't a good "trail."

  • I'm for anything that will better ensure standardization--where it's appropriate. 

    Although this article's topic covers one approach, SysAdmins "should have" been on top of server sprawl and configuration drift in the VM world from the first day a VM product was tested or put on line.  Even as they needed to be on top of this in the physical world, the same proactive practice should have transferred over in to the VM world.

    What went wrong?

  • I love all the vCloud orchestration stuff from VMware.  Now with NSX as well it's taking automation and security to a whole new level with the entire suite rolled out.

Thwack - Symbolize TM, R, and C