To paraphrase a lyric from Hamilton,
Deciding to code is easy; choosing a language is harder. There are many programming languages that are good candidates for any would-be programmer, but selecting the one that will be most beneficial to each individual need is a very challenging decision. In this post, I will attempt to give some background on programming languages in general, as well as examine a few of the most popular options and attempt to identify where each one might be the most appropriate choice.
Programming Types and Terminology
Before digging into any specific languages, I'm going to explain some of the properties of programming languages in general, because these will contribute to your decision as well.
Interpreted vs Compiled
An interpreted language is one where the
language reads the script and generates machine-level instructions on the fly. When an interpreted program is
run, it's actually the language interpreter that is running with the script as an input. Its output is the hardware-specific
bytecode (i.e. machine code). The advantages of interpreted languages are that they are typically quick to edit and debug, but they are also slower to run because the conversion to bytecode has to happen in real-time. Distributing a program written in an interpreted language effectively means distributing the source code.
A compiled language is one where the script is processed by the language compiler and turned into an executable file containing the machine-specific bytecode. It is this output file that is run when the script is executed. It isn't necessary to have the language installed on the target machine to execute bytecode, so this is the way most commercial software is created and distributed. Compiled code runs quickly because the hard work of determining the bytecode has already been done, and all the target machine needs to do is execute it.
Strongly Typed vs Weakly Typed
What is Type?
In programming languages, type is the concept that each piece of data is of a particular kind. For example, 17 is an integer.
John is a string.
2017-05-07 10:11:17.112 UTC is a time. The reason languages like to keep track of
type is to determine how to react when operations are performed on them.
As an example, I have created a simple program where I assign a value of some sort to a variable (a place to store a value), imaginatively called x. My program looks something like this:
x = 6
print x + x
I tested my script and changed the value of x to see how each of five languages would process the answer. It should be noted that putting a value in quotes ("
) implies that the value is a string, i.e. a sequence of characters.John
is a string, but there's no reason678" can't be a string too. The values of x are listed at the top, and the table shows the result of adding x to x:
Weakly Typed Languages
Why does this happen? Perl and Bash are weakly (or loosely) typed; that is, while they understand what a string is and what an integer is, they're pretty flexible about how those are used. In this case, Perl and bash made a best effort guess at whether to treat the strings as numbers or strings; although the value
6 was defined in quotes (and quotes mean a string), the determination was that in the context of a plus sign, the program must be trying to add numbers together. Python and Ruby, on the other hand, respected
6 as a string and decided that the intent was to concatenate the strings, hence the answer of
The flexibility of the weak typing offered by a language like Perl is both a blessing and a curse. It's great because the programmer doesn't have to think about what data type each variable represents, and can use them anywhere and let the language determine the right type to use based on context. It's awful because the programmer doesn't have to think about what data type each variable represents, and can use them anywhere. I speak from bitter experience when I say that the ability to (mis)use variables in this way will, eventually, lead to the collapse of civilization. Or worse, unexpected and hard-to-track-down behavior in the code.
That Bash error? Bash for a moment pretends to have strong typing and dislikes being asked to
add variables whose value begins with a number but is not a proper integer. It's too little, too late if you ask me.
Strongly Typed Languages
In contrast, Python and Ruby are strongly-typed languages (as are C and Go). In these languages to add two numbers means adding two integers (or floating point numbers, aka floats). Concatenating strings requires two or more strings. Any attempt to mix and match the types will generate an error. For example in Python:
>>> a = 6
>>> b = "6"
>>> print a + b Traceback (most recent call last): File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Strongly typed languages have the advantage that accidentally adding the wrong variable to an equation, for example, will not be permitted if the type is incorrect. In theory, it reduces errors and encourages a more explicit programming style. It also ensures that the programmer is clear that the value of an int(eger) will never have decimal places. On the other hand, sometimes it's a real pain to have to convert variables from one format to another to use its value in a different context.
PowerShell appears to want to pretend to be strongly typed, but a short test reveals some scary behavior. I've included a brief demonstration at the end in the section titled
Addendum: PowerShell Bonus Content.
Dynamic / Statically Typed
There's one more twist to the above definitions. While functionally the language may be strongly typed, for example, it's possible to allow a variable to change its type at any time. For example, it is just fine in Perl to initialize a variable with an integer, then give it a new value which is a string:
$a = 1;
$a = "hello";
Dynamic typing is typically a property of interpreted languages, presumably because they have more flexibility to change memory allocations at runtime. Compiled languages, on the other hand, tend to be statically typed; if a variable is defined as a string, it cannot change later on.
Modules / Includes / Packages / Imports / Libraries
Almost every language has some system whereby the functionality can be expanded by installing and referencing some code written by somebody else. For example, Perl does not have SSH support built in, but there is a Net::SSH module which can be installed and used. Modules are the easiest way to avoid reinventing the wheel and allow us to ride the back of somebody else's hard work. Python has packages, Ruby has modules which are commonly distributed in a format called a "gem," and Go has packages. These expansion systems are critical to writing good code; it's not a failure to use them, it's common sense.
Choosing a Language
With some understanding of type, modules and interpreted/compiled languages, now it's time to figure out how to choose the best language. First, here's a quick summary of the most common scripting languages:
|C / I||Type||S / D||Expansion|
I've chosen not to include Bash mainly because I consider it to be more of a wrapper than a fully fledged scripting language suitable for infrastructure tasks. Okay, okay. Put your sandals down. I know how amazing Bash is. You do, too. will
Ten years ago I would have said that Perl (version 5.x, definitely not v6) was the obvious option. Perl is flexible, powerful, has roughly eleventy-billion modules written for it, and there are many training guides available. Perl's regular expression handling is exemplary and it's amazingly simple and fast to use. Perl has been my go-to language since I first started using it around twenty-five years ago, and when I need to code in a hurry, it's the language I use because I'm so familiar with it. With that said, for scripting involving IP communications, I find that Perl can be finicky, inconsistent and slow. Additionally, vendor support for Perl (e.g. providing a module for interfacing with their equipment) has declined significantly in the last 5-10 years, which also makes Perl less desirable. Don't get me wrong; I doubt I will stop writing Perl scripts in the foreseeable future, but I'm not sure that I could, in all honesty, recommend it for somebody looking to control their infrastructure with code.
It probably won't be a surprise to learn that for network automation, Python is probably the best choice of language. I'm not entirely clear why people love Python so much, and why even the people who love Python seem stuck on v2.7 and are avoiding the move to v3.0. Still, Python has established itself as the de facto standard for networking automation. Many vendors provide Python packages, and there is a strong and active community developing and enhancing packages. Personally, I have had problems adjusting to the use of whitespace (indent) to indicate code block hierarchy, and it makes my eyes twitch that a block of code doesn't end with a closing brace of some kind, but I know I'm in the minority here. Python has a rich library of packages to choose from, but just like Perl, it's important to choose carefully and find a modern, actively supported package. If you think that semicolons at the end of lines and braces surrounding code make things look horribly complicated, then you will love Python. A new Python user really should learn version 3, but note that v3 code is not backward compatible with v2.x, and it may be important to check the availability of relevant vendor packages in a Python3-compatible form.
Oh Ruby, how beautiful you are. I look at Ruby as being like Python, but cleaner. Ruby is three or four years younger than Python, and borrows parts of its syntax from languages like Perl, C, Java, Python, and Smalltalk. At first, I think Ruby can seem a little confusing compared to Python, but there's no question that it's a terrifically powerful language. Coupled with Rails (
Ruby on Rails) on a web server, Ruby can be used to quickly create database-driven web applications, for example. I think there's almost a kind of snobbery surrounding Ruby, where those who prefer Ruby look down on Python almost like it's something used by amateurs, whereas Ruby is for professionals. I suspect there are many who would disagree with that, but that's the perception I've detected. However, for network automation, Ruby has not got the same momentum as Python and is less well supported by vendors. Consequently, while I think Ruby is a great language, I would not recommend it at the moment as a network automation tool. For a wide range of other purposes though, Ruby would be a good language to learn.
PowerShell – that Microsoft thing – used to be just for Windows, but now it has been ported to Linux and MacOS as well. PowerShell has garnered strong support from many Windows system administrators since its release in 2009 because of the ease with which it can interact with Windows systems. PowerShell excels at automation and configuration management of Windows installations. As a Mac user, my exposure to PowerShell has been limited, and I have not heard about it being much use for network automation purposes. However, if compute is your thing, PowerShell might just be the perfect language to learn, not least because it's native in Windows Server 2008 onwards. Interestingly, Microsoft is trying to offer network switch configuration within PowerShell, and released its Open Management Infrastructure (OMI) specification in 2012, encouraging vendors to use this standard interface to which PowerShell could then interface. As a Windows administrator, I think PowerShell would be an obvious choice.
Go is definitely the baby of the group here, and with its first release in 2012, the only one of the languages here created in this decade! Go is an open source language developed by Google, and is still mutating fairly quickly with each release, as new functionality is being added. This is a good because things that are perceived as
missing are frequently added in the next release. It's bad because not all code will be forward compatible (i.e. will run in the next version). As Go is so new, the number of packages available for use is much more limited than for Ruby, Perl, or Python. This is obviously a potential downside because it may mean doing more work for one's self.
Where Go wins, for me, is on speed and portability. Because Go is a compiled language, the machine running the program doesn't need to have Go installed; it just needs the compiled binary. This makes distributing software incredibly simple, and also makes Go pretty much immune to anything else the user might do on their platform with their interpreter (e.g. upgrade modules, upgrade the language version, etc). More to the point, it's trivial to get Go to cross-compile for other platforms; I happen to write my code on a Mac, but I can (and do) compile tools into binaries for Mac, Linux, and Windows and share them with my colleagues. For speed, a compiled language should always beat an interpreted language, and Go delivers that in spades. In particular, I have found that Go's HTTP(S) library is incredibly fast. I've written tools relying on REST API transactions in both Go and Perl, and Go versions blow Perl out of the water. If you can handle a strongly, statically typed language (it means some extra work at times) and need to distribute code, I would strongly recommend Go. The vendor support is almost non-existent, however, so be prepared to do some work on your own.
There is a lot to consider when choosing a language to learn, and I feel that this post may only scrape the surface of all the potential issues to take into account. Unfortunately, sometimes the issues may not be obvious until a program is mostly completed. Nonetheless, my personal recommendations can be summarized thus:
- For Windows automation: PowerShell
- For general automation: Python (easier), or Go (harder, but fast!)
If you're a coder/scripter, what would you recommend to others based on your own experience? In the next post in this series, I'll look at ways to learn a language, both in terms of approach and some specific resources.
Addendum: Powershell Bonus Content
In the earlier table where I showed the results from adding
x + x, PowerShell behaves perfectly. However, when I started to add int and string variable types, it was not so good:
PS /> $a = 6
PS /> $b = "6"
PS /> $y = $a + $b
PS /> $y 12
In this example, PowerShell just interpreted the string
6 as an integer and added it to 6. What if I do it the other way around and try adding an integer to a string?
PS /> $a = "6"
PS /> $b = 6
PS /> $y = $a + $b
PS /> $y 66
This time, PowerShell treated both variables as strings; whatever type the first variable is, that's what gets applied to the other. In my opinion that is a disaster waiting to happen. I am inexperienced with PowerShell, so perhaps somebody here can explain to me why this might be desirable behavior because I'm just not getting it.