Network Automation: the Good, the Bad, and the Ugly

Infrastructure automation is nothing new. We’ve been automating our server environments for years, for example. Automating network devices isn’t necessarily brand new either, but it’s never been nearly as popular as it has been in recent days.

Part of the reason network engineers are embracing this new paradigm is because of the potential time-savings that can be realized by scripting common tasks. For example, I recently worked with someone to figure out how to script a new AAA configuration on hundreds of access switches in order to centralize authentication. Imagine having to add those few lines of configuration one switch at a time – especially in a network in which there were several different platforms and several different local usernames and passwords. Now imagine how much time can be saved and typos avoided by automating the process rather than configuring the devices one at a time.

That’s the good.

However, planning the pseudocode alone became a rabbit hole in which we chased modules on GitHub, snippets from previous scripts, and random links in Google trying to figure out the best way to accommodate all the funny nuances of this customer’s network. In the long run, if this was a very common task, we would have benefited greatly from putting in all the time and effort needed to nail our script down simply because it would then be re-usable and shareable with the rest of the community. However, by the time I checked in again with some more ideas, my friend was already well underway configuring the switches manually simply because it was billable time and he needed to get the job done right away. There’s a balance between diminishing returns and long-term benefits to writing code for certain tasks.

That’s the bad.

We had some semblance of a script going, however, and after some quick peer review we wanted to use it on the remaining switches. Rather than modify the code to remove the switches my friend already configured, we left it alone because we assumed it wouldn’t hurt to run the script against everything.

So we ran the script, and several hundred switches became unreachable on the management network. Nothing went hard down, mind you, but we weren’t able to get into almost the entire access layer. Thankfully this was a single campus of several buildings using a lot of switch stacks, so with the help of the local IT staff, the management access configuration on all the switches was rolled back the hard way in one afternoon. This happened as a result of a couple guys with a bad script. We still don’t really know what happened, but we know that this was a human error issue – not a device issue.

That’s the ugly.

Network automation seeks to decrease human error, but the process requires skill, careful peer review, and maybe even a small test pool. Otherwise, the blast radius of a failure could be very large and impactful. There is also great automation software out there with easy-to-use interfaces that can enable you to save time without struggling to learn a new programming language.

But don’t let that dissuade you from jumping with both feet into learning Python and experimenting with scripting common tasks. In fact, there are even methods for preventing scripting misconfigurations as well. Just remember that along with the good, there can be some bad, and if ignored, that bad could get ugly.