Geek Speak

5 Posts authored by: Joep Piscaer

We've talked about a lot in this series about DevOps Tooling:

  1. Start Your DevOps Journey by Looking at the Value You Add

  2. Why Choosing a One-Size-Fits-All Solution Won’t Work

  3. How to Choose the Right Tools in Your DevOps Toolchain

  4. How To Prevent Tool Sprawl in Your DevOps Toolchain

 

In this installment, we'll talk about how to prevent lock-in for tooling. Lock-in makes a customer dependent on a vendor for products and services, and likewise unable to use another vendor without substantial switching costs.

There are different variations of lock-in. In IT, lock-in is usually created by either a vendor or a technology.

 

Vendor Lock-In

Vendors try to keep you locked to their products and services by creating a complementary, integrated ecosystem of products. By offering a single service at very low cost, or with very low barriers to entry, customers are drawn in. Once using that initial service or product, it's easier to persuade customers to use more services in the portfolio. A notable, recent example is Amazon AWS. They drew in the masses by offering cheap and easy-to-use cloud object storage services. By offering services adjacent to or on top of S3, AWS made it easy for customers to increase their AWS spend. Other examples include proprietary file formats that only work with the vendor's software.

 

While these are positive examples, there are many less appealing strategies for vendors to keep you from leaving their products behind. In many cases, they raise barriers by increasing the cost of switching to another service.

 

Technology Lock-In

In some cases, leaving behind a piece of technology is nearly impossible. Often caused by technical debt or age, technological lock-in is more than the high cost of switching, it’s the impact on business processes. Great examples include the mainframe systems and the COBOL programming language many banks still use. Switching from these (very old) technologies has a big impact on business processes. The risk of switching is simply too high. In many cases, lack of knowledge or documentation on the old systems are the cause of the lock-in. Admins are afraid to touch, change, or upgrade the systems, and won't fix if it ain't broke. If systems are ignored for long enough, the mountain of debt grows so high that switching is no longer viable.

 

How Do We Prevent Lock-In?

There's an odd parallel between preventing technological lock-in and the DevOps movement itself. If DevOps is about accelerating the delivery of work, making each change as small as possible, incorporating feedback, and learning from data and information, then preventing a technological lock-in is all about continuously changing a legacy system to keep documentation, skills, and experience with those systems up-to-date. It prevents knowledge from seeping away and ensures IT engineers have hands-on experience. Simply put, it's a matter of not ignoring a system, but instead changing and transforming it continuously.

 

Vendor lock-in, on the other hand, is harder to spot and fight. You should want a certain, manageable, level of lock-in. Vendor lock-in has many advantages: it lowers barriers, cuts costs, and makes it easier to implement. Remember the AWS example? It's easier to spin up a virtual machine in EC2 to process your data, stored in S3, than to move the storage to another cloud provider first. So realistically, there's always lock-in, but it's a matter of how much switching cost an organization is willing to bear.

 

The Exit-Before-You-Enter Strategy

There's always lock-in, be it technological or vendor-based. The amount of lock-in you're willing to accept depends on the different products you use. To enter a lock-in willingly, you can use the exit-before-you-enter strategy to help you think about what the future switching cost will be, roughly, if you start using a service or product.

 

The Loosely-Coupled Strategy

By loosely coupling different products or services, there's less lock-in. By using APIs or other standard integrating interfaces between services or applications, switching out one service for another becomes much easier, as long as the interface between them doesn't change significantly. Many DevOps tools for CI/CD, monitoring, and software development offer open APIs that create loosely coupled, but tightly integrated solutions.

We put in all this work to create a fully customized toolchain tailored to your situation. Great! I bet you're happy you can change small bits and pieces of your toolchain when circumstances change, and the Periodic Table of DevOps tools has become a staple when looking for alternatives. You're probably still fighting the big vendors that want to offer you a one-size-fits-all solution.

 

But there's a high chance you're drowning in tool number 9,001. The tool sprawl, like VM sprawl after we started using virtualization 10-15 years ago, is real. In this post, we'll look at managing the tools managing your infrastructure.

 

A toolchain is supposed to make it easier for you to manage changes, automate workflows, and monitor your infrastructure, but isn't it just shifting the problem from one system to another? How do we prevent spending disproportionate amounts of time managing the toolchain, keeping us from more important value-add work?

 

The key here isn’t just to look at your infrastructure and workflows with a pair of LEAN glasses, but also to look at the toolchain with the same perspective. Properly integrating tools and selecting the right foundation makes all the difference.

 

Integration

One key aspect of managing tool sprawl is properly integrating all the tools. With each handover between tools, there's a chance of manual work, errors, and friction. But what do we mean by properly integrating? Simply put: no manual work between steps, and as little as possible custom code or scripting.

 

In other words, tools integrating natively with each other take less custom work. The integration comes ready out of the box, and are an integral, vendor-supported part of the product. This means you don't have to do (much) work to integrate it with different tools. Selecting tools that natively integrate is a great way to reduce the negative side effects of tool sprawl.

 

Fit for Purpose to Prevent DIY Solutions

Some automation solutions are very flexible and customizable, like Microsoft PowerShell. It's widely adopted, very flexible, highly customizable, and one of the most powerful tools in an admin's tool chest, but getting started with it can be daunting. This flexibility leads to some complexity, and often there's no single way of accomplishing things. This means you need to put in the work to make PowerShell do what you want, instead of a vendor providing workflow automation for you. Sometimes, it's worthwhile to use a fit-for-purpose automation tool (like Jenkins, SonarQube, or Terraform) that has automated the workflows for you instead of a great task automation scripting language. Developing and maintaining your own scripts for task automation and configuration management can easily take up a large chunk of your workweek, and some of that work has little to do with the automation effort you set out to accomplish in the first place. Outsourcing that responsibility to the company (or open-source project) that created the high-level automation logic (Terraform by HashiCorp is a great example of this) makes sense, saving your time for job-specific work adding actual value to your organization.

 

If you set out to use a generic scripting language, choose one that fits your technical stack (like PowerShell if you're running Microsoft stack) and technical pillar (such as Terraform for infrastructure-as-code automation).

Crisis Averted. Here's Another.

So, your sprawl crisis has been averted. Awesome; now we have the right balance of tools doing what they're supposed to do. But there's a new potential hazard around the corner: lock-in.

 

In the next and final post in this series, we'll have a look at the different forms of lock-in: vendor lock-in, competition lock-in, and commercial viability. 

 

We've established that choosing a one-size-fits-all solution won't work. So, what does work? Let's look at why we need tools in the first place, and what kind of work these tools take off our hands.

 

Two Types of Work

Looking at the kinds of work IT people do, there are two broad buckets. First, there's new work: the kind of work that has tangible value, either for the end customer or for the IT systems you operate and manage. This is the creative work IT people to do create new software, new automation, or new infrastructure. It’s work coming from the fingers of a craftsmen. It takes knowledge, experience, and creativity to create something novel and new. It's important to realize that this type of proactive work is impossible to automate. Consider the parallel to the manual labor artists and designers do to create something new; it's just not something a computer can generate or do for us.

 

Second, there's re-work and toil. These kinds of reactive work are unwanted. Re-work needs to be done to correct quality issues in work done earlier, like fixing bugs in software, improving faulty automation code, and mitigating and resolving incidents on production. This also includes customer support work after incidents and fixing technical debt due to bad decisions in the past, or badly managing the software lifecycle. This leads to technical debt, outdated software, or systems and architectures that haven't been adapted to new ways of work, scalability, or performance requirements. For IT ops, physical machines, snowflake virtual machines, and on-premises productivity systems (like email, document management, or collaboration tools) are good examples.

 

How Do Tools Fit In?

Now that we understand the types of work we do, we can see where automation tools come in. They take away re-work and toil. A well-designed toolchain frees up software and infrastructure engineers to spend more time on net-new work, like new projects, new features, or improvements to architecture, systems, and automation. In other words: the more time you spend improving your systems, the better they'll get. Tools help you break the cycle of spending too much time fixing things that broke and not preventing incidents in the first place. Automation tooling helps remove time spent on repetitive tasks that go through the same process each time.

 

By automating, you're creating a representation of the process in code, which leads to consistent results each time. It lowers the variation of going through a process manually with checklists, which invariably leads to a slightly different process with unpredictable outcomes. It's easy to improve the automation code each time, which lowers the amount of re-work and faults each time you improve the code. See how automating breaks the vicious circle? Instead, the circle goes up and up and up with each improvement.

 

A proper toolchain increases engineering productivity, which in turn leads to more, better, and quicker improvements, a lower failure rate of those improvements, and a quicker time to resolving any issues.

 

How Do I Know If Work Is a Candidate for Automation?

With Value Stream Mapping, a LEAN methodology. This is a way of visualizing the flow of work through a process from start to finish. Advanced mappings include red and green labels for each step, identifying customer value, much like the new work and re-work we talked about earlier. Good candidates include anything that follows a fixed process or can be expressed as code.

 

It's easy to do a VSM yourself. Start with a large horizontal piece of paper or Post-It notes on a wall, and write down all the steps chronologically. Put them from left to right. Add context to each step, labeling each with green for new work or red for toil. If you're on a roll, you can even add lead time and takt time to visualize bottlenecks in time.

 

See a bunch of red steps close to each other? Those are prime candidates for automation.

 

Some examples are:

  1. If a piece of software is always tested for security vulnerabilities
  2. If you make changes to your infrastructure
  3. If you test and release a piece of new software using a fixed process
  4. If you create a new user using a manual checklist
  5. If you have a list of standard changes that can go to production after checking with the requirements of the standard change

 

But What Tools Do I Choose?

While the market for automation tooling has exploded immensely, there's some great resources to help you see the trees through the forest.

  1. First and foremost: keep it simple. If you use a Microsoft stack, use Microsoft tools for automation. Use the tool closest to the thing you're automating. Stay within your ecosystem as a starting point. Don't worry about a tool that encompasses all the technology stacks you have.
  2. Look at overviews like the Periodic Table of DevOps Tools.
  3. Look at what the popular kids are doing. They're usually doing it for a reason, and tooling tends to come in generations. Configuration management from three generations ago is completely different than modern infrastructure-as-code tools, even if they do the same basic thing.

 

Next Up

Happy hunting for the tools in your toolchain! In the next post, I'll discuss a problem many practitioners have after their first couple of successful automation projects: tool sprawl. How do you manage the tools that manage your infrastructure? Did we just shift the problem from one system to another? A toolchain is supposed to simplify your work, not make it more complex. How do you stay productive and not be overloaded with the management of the toolchain itself? We'll look at integrating the different tools to keep the toolchain agile as well as balancing the number of tools.

In my previous post, we talked about the CALMS framework as an introduction to DevOps, and how it's more than just “DevOps tooling.” Yes, some of it is about automation and what automation tools bring to the table, but it's what teams do with automation to quickly create more great releases with fewer and shorter incidents. Choosing a one-size-fits-all solution won't work with a true DevOps approach. For this, I'll briefly go into CALMS (but you can read more in my previous post) and the four key metrics to measure a team's performance. From there, we'll look at choosing the right tooling.

 

CALMS

source: atlassian.com, the CALMS framework for DevOps

Image: Atlassian, https://www.atlassian.com/devops

Let's quickly reiterate CALMS:

  • Culture
  • Automation
  • Lean
  • Measurements
  • Sharing

 

These five core values are an integral part of high-performing teams. Successful teams tend to focus on these areas to improve their performance. What makes teams successful in the context of DevOps tooling, you ask? I'll explain.

 

Key Performance Metrics for a Successful DevOps Approach

 

Measuring a team's performance can be hard. You want to measure the metrics they can directly influence, but avoid being overly vague in measuring success. For instance, measuring the customer NPS involves much more than a single team's efforts, so that one team's efforts can get lost in translation. A good methodology of measuring DevOps performance comes from DevOps Research and Assessment, a company that publishes a yearly report on the state of DevOps, “Accelerate: State of DevOps 2018: Strategies for a New Economy.” They recommend using these four performance metrics:

Source: DORA Accelerate State of DevOps 2018

 

  • Deployment frequency: how often does the team deploy code?
  • Lead time for changes: how long does it take to go from code commit to code successfully running in production?
  • Time to restore service: how long does it take to restore service after an incident (like an unplanned outage or security incident)?
  • Change failure rate: what percentage of changes results in an outage?

Image: 2018 State of DevOps Report, https://cloudplatformonline.com/2018-state-of-devops.html

 

These metrics measure the output, like changes to software or infrastructure, as well as the quality of the output. It's vague enough to be broadly applicable, but concrete enough to be of value for many IT teams. Also, these metrics clearly embrace the core values from the CALMS framework. Without good post-mortems (or sprint reviews), how do you bring down the change failure rate or time to restore service? Without automation, how do you increase deployment frequency?

 

Choosing the right support infrastructure for your automation efforts is key to increasing performance, though, and a one-size-fits-all solution will almost certainly be counter-productive.

 

Why The Right Tool Is Vital

Each team is unique. Team members each have their own skills. The product or service they work on is built around a specific technology stack. The maturity of the team and the tech is different for everyone. The circumstances in which they operate their product or service and their dependencies are incomparable.

 

So what part of that mix makes a one-size-fits-all solution fit? You guessed it: none.

 

Add in the fact that successful teams tend to be nimble and quick to react to changing circumstances, and you'll likely conclude that most “big” enterprise solutions are incompatible with the DevOps approach. Every problem needs a specific solution.

 

I'm not saying you need to create a complex and unmanageable toolchain, which would be the other extreme. I’m saying there's a tendency for companies to buy in to big enterprise solutions because it looks good on paper, it’s an easier sell (as opposed to buying dozens of smaller tools), and nobody ever got fired for buying $insert-big-vendor-name-here.

 

And I'm here to tell you that you need to resist that tendency. Build out your toolchain the way you and your team sees fit. Make sure it does exactly what you need it to do, and make sure it doesn't do anything you don't need. Simplify the toolchain.

 

Use free and open-source components that are easier to swap out, so you can change when needed without creating technical debt or being limited by the big solution that won't let you use the software as you want it (a major upside of “libre” software, which many open-source is: you’re free to use it in a way that you intend, not in just the way the original creator intended).

 

Next Up

So there you have it. Build your automation toolchain, infrastructure, software service, or product using the tools you need, and nothing more. Make components easy to swap out when circumstances change. Don't buy into the temptation that any vendor can be your one-size-fits-all solution. Put in the work to create your own chain of tools that work for you.

 

In the next post in this series, I'll dive into an overview and categorization of a DevOps toolchain, so you'll know what to look out for, what tools solve what problems, and more. We'll use the Periodic Table of DevOps tools, look at value streams to identify which tools you need, and look at how to choose for different technology stacks, ecosystems, and popularity to solve specific problems in the value stream.

Starting with DevOps can be hard. Often, it's not entirely clear why you're getting on the DevOps train. Sometimes, it's simply because it's the new trendy thing to do. For some, it's to minimize the friction between the traditional IT department (“Ops”) and developers building custom applications (“Dev”). Hopefully, it will solve some practical issues you and your team may have.

 

In any case, it's worth looking at what DevOps brings to the table for your situation. In this post, I'll help you set the context of the different flavors and aspects of DevOps. “CALMS” is a good framework to use to look at DevOps.

 

CALMS

CALMS neatly summarizes the different aspects of DevOps. It stands for:

  • Culture
  • Automation
  • Lean
  • Measurement
  • Sharing

 

Note how technology and technical tooling are only one part of this mix. This might be a surprise for you, as many focus on just the technological aspects of DevOps. In reality, there's many more aspects to consider.

 

And taking this one step further: getting started with DevOps is about creating and fostering high-performance teams that imagine, develop, deploy, and operate IT systems. This is why Culture, Automation, Lean, Measurement, and Sharing are equal parts of the story.

 

Culture

Arguably, the most important part of creating highly effective teams is the aspect of shared responsibility. Many organizations choose to create multi-disciplinary teams that include specialists from Ops, Dev, and Business. Each team can take full responsibility over the full lifecycle of a part (or entire) IT system, technical domain, or part of the customer journey. The team members collaborate, experiment, and continuously improve their system. They'll take part in blameless post-mortems or sprint reviews, providing feedback and improving processes and collaboration.

 

Automation

This is the most concrete part of DevOps: tooling and automation. It's not just about automation, though. It's about knowing the flow of information through the process from development to production, also called a value stream, and automating those.

 

For infrastructure and Ops, this is also called Infrastructure-as-Code; a methodology of applying software development practices to infrastructure engineering and operational work. The key to infra-as-code is treating your infrastructure as a software project. This means maintaining and managing the state of your infrastructure in version-controlled declarative code and definitions. This code goes through a pipeline of testing and validation before the state is mirrored on production systems.

 

A good way to visualize this is the following flow chart, which can be equally applied to infrastructure engineering and software engineering.

 

The key goal of visualizing these flows is to identify waste, which in IT is manual and reactive labor. Examples are fixing bugs, mitigating production issues, supporting customer incidents, and solving technical debt. This is all a form of re-work that, in an ideal world, could be avoided. This type of work takes engineering time away from the good kind of manual work: creating new MVPs for features, automation, tests, infrastructure configuration, etc.

 

Identifying manual labor that can be simplified and automated creates an opportunity to choose the right tools to remove waste, which we'll dive into in this blog series. In upcoming posts, you'll learn how to choose the right set of DevOps monitoring tools, which isn’t an easy task by any stretch of the imagination.

 

Lean

Lean is a methodology first developed by Toyota to optimize its factories. These days, Lean can be applied to manufacturing, software development, construction, and many other disciplines. In IT, Lean is valuable to visualize and map out the value stream, a single flow of work within your organization that benefits a customer. An example would be the manufacturing of a piece of code from ideation to when it's in the hands of the customer via way of a production release. It's imperative to identify and visualize your value stream, with all its quirks, unnecessary approval gates, and process steps. With this, you'll be able to remove waste and toil from this process and create flow. These are all important aspects of creating high-performing teams. If your processes contain a lot of waste, complexity, or variation, chances are, the team won't be as successful.

 

Measurements

How do you measure performance and success? The DevOps mindset heavily leans on measuring performance and progress. While it doesn't prescribe specific metrics to use, there are a couple of common KPIs many teams go by. For IT teams, there are four crucial metrics to measure the team's performance, inside-out:

  1. Deployment Frequency: how often does the team deploy code?
  2. Lead time for changes: how long does it take to go from code commit to code successfully running in production?
  3. Time to restore service: how long does it take to restore service after an incident (like an unplanned outage or security incident)?
  4. Change failure rate: what percentage of changes results in an outage?

 

In addition, there are some telling metrics to measure success from the outside-in:

  1. Customer satisfaction rate (NPS)
  2. Employee satisfaction (happiness index)
  3. Employee productivity
  4. Profitability of the service

 

Continuously improving the way teams work and collaborate and minimizing waste, variation, and complexity will result in measurable improvements in these key metrics.

 

Sharing

To create high-performing teams, team members need to understand each other while still contributing their expertise. This creates the tension between “knowing a lot about few things” and “knowing a little about a lot of things.” This is known as the T-shaped knowledge problem. To balance between the two, high-performing teams are known to spend a decent amount of time on sharing knowledge and exchanging experiences. This can take shape in many ways, like peer review, pair programming, knowledge sessions, communities of expertise, meetups, training, and internal champions that further their field with coworkers.

 

Next Up

With this contextual overview, we've learned DevOps is much more than just technology and tooling; grasping these concepts is vital for creating high-performance teams. But choosing the right approach for automation is no joke, either. There's so much to choose from, ranging from small tools that excel at one task but are harder to integrate into your toolchain to “OK” enterprise solutions that do many tasks but come pre-integrated. In the next post in this getting started with DevOps series, we'll look at why choosing a one-size-fits-all solution won't work. 

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.