You may be wondering why, after creating four blog posts encouraging non-coders to give it a shot, select a language and break down a problem into manageable pieces, I would now say to stop. The answer is simple, really: not everything is worth automating (unless, perhaps, you are operating at a similar scale to somebody Amazon).
The 80-20 Rule
Here's my guideline: figure out what tasks take up the majority (i.e. 80%) of your time in a given time period (in a typical week perhaps). Those are the tasks where making the time investment to develop an automated solution is most likely to see a payback. The other 20% are usually much worse candidates for automation where the cost of automating it likely outweighs the time savings.
As a side note, the tasks that take up the time may not necessarily be related to a specific work request type. For example, I may spend 40% of my week processing firewall requests, and another 20% processing routing requests, and another 20% troubleshooting connectivity issues. In all of these activities, I spend time identifying what device, firewall zone, or VRF various IP addresses are in, so that I can write the correct firewall rule, or add routing in the right places, or track next-hops in a traceroute where DNS is missing. In this case, I would gain the most immediate benefits if I could automate IP address research.
I don't want to be misunderstood; there is value in creating process and automation around how a firewall request comes into the queue, for example, but the value overall is lower than for a tool that can tell me lots of information about an IP address.
That Seems Obvious
You'd think that it was intuitive that we would do the right thing, but sometimes things don't go according to plan:
Once you write a helpful tool or an automation, somebody will come back and say,
Ah, what if I need to know X information too? I need that once a month when I do the Y report. As a helpful person, it's tempting to immediately try and adapt the code to cover every conceivable corner case and usage example, but having been down that path, I counsel against doing so. It typically makes the code unmanageably complex due to all the conditions being evaluated and worse, it goes firmly against the 80-20 rule above.
Feeping Creatures is a Spoonerism referring to
Creeping Features, i.e. an always expanded feature list for a product.
A Desire to Automate Everything
There's a great story in What Do You Care What Other People Think (Richard Feynman) that talks about Mr. Frankel, who had developed a system using a suite of IBM machines to run the calculations for the atomic bomb that was being developed at Los Alamos.
"Well, Mr. Frankel, who started this program, began to suffer from the computer disease that anybody who works with computers now knows about. [...] Frankel wasn't paying any attention; he wasn't supervising anybody. [...] (H)e was sitting in a room figuring out how to make one tabulator automatically print arctangent X, and then it would start and it would print columns and then bitsi, bitsi, bitsi, and calculate the arc-tangent automatically by integrating as it went along and make a whole table in one operation.
Absolutely useless. We had tables of arc-tangents. But if you've ever worked with computers, you understand the disease -- the delight in being able to see how much you can do. But he got the disease for the first time, the poor fellow who invented the thing."
It's exciting to automate things or to take a task that previously took minutes, and turn it into a task that takes seconds. It's amazing to watch the 80% shrink down and down and see productivity go up. It's addictive. And so, inevitably, once one task is automated, we begin looking for the next task we can feel good about, or we start thinking of ways we could make what we already did even better. Sometimes the coder is the source of creeping features.
It's very easy to lose touch with the larger picture and stay focused on tasks that will generate measurable gains. I've fallen foul of this myself in the past, and have been delighted, for example, with a script I spent four days writing, which pulled apart log entries from a firewall and ran all kinds of analyses on it, allowing you to slice the data any which way and generate statistics. Truly amazing! The problem is, I didn't have a use for most of the stats I was able to produce, and actually I could have fairly easily worked out the most useful ones in Excel in about 30 minutes. I got caught up in being able to do something, rather than actually needing to do it.
Solve A Real Problem
Despite my cautions above, I maintain that the best way to learn to code is to find a real problem that you want to solve and try to write code to do it. Okay, there are some cautions to add here, not the least of which is to run tests and confirm the output. More than once, I've written code that seemed great when I ran it on a couple of lines of test data, but then when I ran it on thousands of lines of actual data, I discovered oddities in the input data, or in the loop that processes all the data reusing variables carelessly or similar. Just like I tell my kids with their math homework, sanity check the output. If a script claims that a 10Gbps link was running at 30Gbps, maybe there's a problem with how that figure is being calculated.
Don't Be Afraid to Start Small
Hello World! script may feel like one of the most pointless activities you may ever undertake, but for a total beginner, it means something was achieved and, if nothing else, you learned how to output text to the screen. The phrase, "Don't try to boil the ocean," speaks to this concept quite nicely, too.
If your ultimate aim is to automate production device configurations or orchestrate various APIs to dance to your will, that's great, but don't start off by testing your scripts in production. Use device VMs where possible to develop interactions with different pieces of software. I also recommend starting by working with
read commands before jumping right in to the potentially destructive stuff. After all, after writing a change to a device, it's important to know how to verify that the change was successful. Developing those skills first will prove useful later on.
Learn how to test for, detect, and safely handle errors that arise along the way, particularly the responses from the devices you are trying to control. Sanitize your inputs! If your script expects an IPv4 address as an input, validate that what you were given is actually a valid IPv4 address. Add your own business rules to that validation if required (e.g. a script might only work with 10.x.x.x addresses, and all other IPs require human input). The phrase
Garbage in, garbage out, is all too true when humans provide the garbage.
Scale Out Carefully
To paraphrase a common saying, automation allows you to make mistakes on hundreds of devices much faster that you could possibly do it by hand. Start small with a proof of concept, and demonstrate that the code is solid. Once there's confidence that the code is reliable, it's more likely to be accepted for use on a wider scale. That leads neatly into the last point:
Good Luck Convincing People
It seems to me that everybody loves scripting and automation right up to the point where it needs to be allowed to run autonomously. Think of it like the Google autonomous car: for sure, the engineering team was pretty confident that the code was fairly solid, but they wouldn't let that car out on the highway without human supervision. And so it is with automation; when the results of some kind of process automation can be reviewed by a human before deployment, that appears to be an acceptable risk from a management team's perspective. Now suggest that the human intervention is no longer required, and that the software can be trusted, and see what response you get.
A coder I respect quite a bit used to talk about
blast radius, or what's the impact of a change beyond the box on which the change is taking place? Or what's the potential impact of this change as a whole? We do this all the time when evaluating change risk categories (is it low, medium, or high?) by considering what happens if a change goes wrong. Scripts are no different. A change that adds an SNMP string to every device in the network, for example, is probably fairly harmless. A change that creates a new SSH access-list, on the other hand, could end up locking everybody out of every device if it is implemented incorrectly. What impact would that have on device management and operations?
I really recommend giving programming a shot. It isn't necessary to be a hotshot coder to have success (trust me, I am not a hotshot coder), but having an understanding of coding will, I believe, will positively impact other areas of your work. Sometimes a programming mindset can reveal ways to approach problems that didn't show themselves before. And while you're learning to code, if you don't already know how to work in a UNIX (Linux, BSD, MacOS, etc.) shell, that would be a great stretch goal to add to your list!
I hope that this mini-series of posts has been useful. If you do decide to start coding, I would love to hear back from you on how you got on, what challenges you faced and, ultimately, if you were able to code something (no matter how small) that helped you with your job!