With the popularity of Agile methodologies and the ubiquity of people claiming they were embracing NetOps/DevOps, I could swear we were supposed to have adopted a new silo-busting software-defined paradigm shift which would deliver the critical foundational framework we needed to become the company of tomorrow, today.
Brace For Cynicism!
I recently discussed Automation Paralysis and the difficulties of climbing the Cliffs of Despair from small, uni-functional automation tools at the bottom (the Trough of Small Successes) to larger, integrated tools at the top (the Plateau of Near-Completion). The way I see it, in most cases, the cliffs are being climbed individually by each of the infrastructure sub-specialties (network, compute, storage, and security), and even though each group goes through similar pains, there's no shared experience here. Each group works through its own problems on their own, in their own way, using their own tools.
For the sake of argument, let's assume that all infrastructure teams have successfully scaled the Cliffs of Despair with only a few casualties along the way, and are making their individual base camps on the Plateau of Near Completion. What happens now? What does the company actually have? What the company has is four complex automation products which are most likely totally incompatible with one another.
Introducing the Silo Family!
If I may, I'd like to introduce to you all to the Silo Family. While they may not show up in genetic test results, I'll wager we all have relatives in these groups:
NetBeard is proud of having automated a large chunk of what many said could not be automated: the network. Given the right inputs, NetBeard's tools can save many hours by pushing configs out to devices in record time. NetBeard's team was the first one in the world to ever have to solve these problems, and it was made all the more difficult by the fact that there were no existing tools available that could do what was needed.
Looking more confident than the rest of the cohort, Compute Monkey can't understand what the fuss is all about. Compute Monkey's servers have been Puppeted and Cheffed and Whatever-Elsed for years, and it's a public secret that deploying compute now requires little more than a couple of mouse clicks.
StoreBot is pleasant enough, but while everybody can hear the noises coming out of StoreBot's mouth, few have the ability to interpret what it all means. If you've ever heard the teacher talking in the Peanuts TV cartoon series it's a bit like that:
Whaa waawaa scuzzy whaaaa LUN wawabyte whaaaaw.
Nobody knows anything about Security Fox. Security Fox likes to keep things secret.
The problem is, each group works in a silo. They don't collaborate with automation, they don't them believe that the other groups would really understand what they do (come on, admit it), and they keep their competitive edge to themselves. I don't believe that any of the groups really means to be insular, but, well, each team has knowledge, and to work together on automation would mean having to share knowledge and be patient while the other groups try to understand what, how, and why the group operates the way it does. And once somebody else understands that role, why should they be the ones to automate it? Isn't that automating another group out of a job? Ultimately, I am cynical about the chances of success based on most of the companies I've seen over the years.
However, if success is desired, I do have a few thoughts, and I'm sure that the THWACK community will have some too.
Bye Bye, Silos
Getting rid of silos does not mean expecting everybody to do everything. Indeed, expertise in each technology in use is required just as it is when the organization was siloed. However, merging all these skills into a single team does mean that it's possible to introduce the idea of shared fate, where the team as a whole is responsible – and hopefully rewarded – for achieving tighter integrations between the different technologies so that there can be a single workflow.
Create APIs Between Groups
If it's not possible to unite the teams, and especially where there is a legacy of automation dragged up to the Plateau, make that automation available to fellow teams via APIs, and the other teams should do the same in return. That way each team gets to feel accomplished and maintains their expertise, management team, and so on, but now automation in each group can use, and be used by, automation from other groups. For example, when deploying a server based on a request to the Compute group, wouldn't it be nice if the Compute group's automation obtained IPs, VLANs, trunks, etc., via an API provided by the Network group. Storage could be instantiated the same way. Everybody gets to do their own thing, but by publishing APIs, everybody gets smarter.
Hyperconvergence is not only Buzzword Approved™, but for some it's the perfect workaround for having to create all this automation in a bespoke fashion. Of course, with convenience comes caveat, and there are quite a few to consider, perhaps including:
- Vendor lock-in (typically only vendor-approved hardware can be used)
- Solution lock-in (no choice but to run the vendor's software)
- Delivers a one-size-fits-most solution, which is good if you're that size
- May not be able to customize to particular needs if not supported by the software
I'm not against hyper converged infrastructure (HCI) by any means, but it seems to me that it's always a compromise in one way or another.
Use Another Solution
Why write all this coordinated automation when somebody else can do it for you? Well, because somebody else might not do it quite the way you had in mind. I mean, why not spin up some OpenStack in the corporate DC? OpenStack has a component for everything, I hear, including compute, storage, network, vegan recipes, key management, 18th century French poetry, and orchestration. OpenStack can be incredibly powerful, but last I heard it's really not fun to install and maintain for oneself; it's much nicer to let somebody else run it and just subscribe to the service; sounds a bit like
cloud doesn't it? On which note:
Make It Somebody Else's Problem (MISEP). The big cloud providers have managed to de-silo their teams, or maybe they were never siloed in the first place. The point is, services like AWS are half way up the Asymptotic Dream of Full Automation; they pull together all those automation tools, make them work together, orchestrate them, then provide pointy-clicky access via a web browser. What's not to love? All the hard work is done, it's cheaper*, there will be no need to write scripts any more**, you can do anything you like***, and life will be wonderful****.
* Rarely true with any reasonable number of servers running
** Also very rarely true
*** I made this up
**** It won't
As ever, if you read between the lines, you might guess that as with HCI (another form of MISEP), such simplicity comes at a price, both literally and figuratively. With cloud services it's usually a many-sizes-fit-most model, but if what you want to do isn't supported, that's just tough luck and you need to find another way. While skills in the previous silos may be less necessary, a new silo appears instead: Cloud Cost Optimization. Make of that what you will.
Why The Long Face?
It may seem that this is an unreasonably negative view of automation – and some of it is a tiny bit
tongue-in-cheek, – but I have tried to highlight some of the very real challenges standing in the way of a beautifully cost-efficient, highly agile, high-quality automated network. Wait, that's reminding me of something, and allows me to make one last dig at the dream:
We can get there. At least, we can get much of the way there, but we have to break out of our silos and start sharing what we know. We also need to go into this with eyes wide open, an understanding of what the alternatives might be, and a reasonable expectation of what we're going to get out of it in the end.