Debriefing DevOps Days (Ohio, 2016)


I wanted to share some of the things I heard and saw during the incredible two days I spent with 300+ attendees at DevOps Days Ohio.

First, I have to admit that after more than a year of attending DevOpsDays around the country, I'm still working on my own definition of what DevOps is, and how it compares and contrasts with some of the more traditional operations. But this event helped gel a number of things for me.

What I realized, with the help of this article (which came out while I was at the conference), is that my lack of clarity is okay, because sometimes the DevOps community is also unclear on what they mean.

One of the ongoing points of confusion for me is the use of words I think I know, but in a context that tells me it means something else. Case in point: configuration management. In my world, that means network device configurations, specifically for backing up, comparing, auditing, and rolling out. But then I hear a pronouncement that, "Config management is code," and, "If you are working on configs, you are a developer now." And most confusingly, "To do config management right, you need to be on Git."

If this has ever struck you as strange, then you (and I) need to recognize that to the DevOps community, the server (and specifically the virtualized server) is king, and the config management they're talking about is the scripted creation of a new server in on-premises or cloud-based environments.

This led to some hilarious interactions for me, including a side conversation where I was talking about on-call emergencies and the other person said, "I don't know why on-call is even a thing any more. I mean, if a system is having a problem, you should just delete it and rebuild it from code, right? Humans don't need to be involved at all."

To which I replied, "Interesting idea, but to my knowledge it's very difficult to delete and re-build a router with a bad WIC using nothing but code."

The reply? "Oh, well, yeah, there's that."

The point of this is not that DevOps-focused IT pros are somehow clueless to the realities of the network, but that their focus is so intensely trained on optimizing the top end of the OSI model, that we monitoring experts need to allow for that, and adjust our dialogue accordingly.

I was honestly blown away to learn how far DevOps culture has made in-roads, even into traditionally risk-averse environments, such as banking. I worked at a bank between 2006 and 2009, right in the middle of the home mortgage crisis, and I could never imagine something like DevOps taking hold. But we heard from folks at Key Bank who spoke openly about the concerns, challenges, and ultimately successes that their shift to DevOps has garnered them, and I saw that the value that cloud, hybrid IT, micro-services, and agile development holds for business that are willing to consider it within the context of their industry, and implement it rationally and thoughtfully.

I was also heartened to hear that monitoring isn't being overlooked. One speaker stated flat out that having monitoring in place is table stakes for rolling out micro-services. This shows an appreciation for the skills we monitoring engineers bring to the table, and presages a potential new avenue for people who simply have monitoring as a bullet item on their to do list to make the leap into a sub-specialization.

There is a lot of work to do, in the form of education, for monitoring specialists and enthusiasts. In one-on-one conversations, as well as in OpenSpace discussions, I found experienced DevOps folks conflating monitoring with alerting; complaining about alerts as noise, while demonstrating a lack of awareness that alerts could be tuned, de-duplicated, or made more sophisticated, and therefore more meaningful; and overlooking the solutions of the past simply because they believed new technology was somehow materially different. Case in point, I asked why monitoring containers was any harder or even different from monitoring LPARs on AIX, and got nervous chuckles from the younger folks, and appreciative belly laughs from some of the old timers in the room.

However, I came to the realization that DevOps does represent a radical departure for monitoring engineers in its "Cattle, not Pets" mentality. When an entire server can be rebuilt in the blink of an eye, the best response to a poorly behaving service is truly not to fix the issue. That attitude alone may take time for those of us who may be mired in biases based in the old days of bare-metal hardware and servers we named after the Brady Bunch or Hobbit dwarves.

Overall, I am excited for the insights that are finally gelling in my mind, and look forward to learning more and becoming a more fluent member of the DevOps community, especially during my upcoming talk at DevOpsDays Tel Aviv!

One final thing: I gave an Ignite talk at this conference and found the format (five minutes, 20 slides that auto-advance every 15 seconds), to be both exhilarating and terrifying. I'm looking forward to my next chance to give one.

  • Lol apparently those things that you turn on old analog synths are a banned word hehe.

  • I'm from Ohio too adatole​ so I'm sure it's nice to be close to home right?  I actually live in Scottsdale, AZ now but spent many long winters in Suburbs of Cincinnati, and at school in Columbus.  Geesh during school we went all up and down the east coast and all through the midwest on the weekends... During the late 90's it was the underground electronic music scene that got me out... lot's of analog and lots of knobs!  DevOps is a BIG change from old making a paper template to remember the knob locations on a classic analog synth.

  • And we do live vicariously through you on these postings....

  • I can't disagree - this is truly my dream job. And if you folks can enjoy some of that with me vicariously, the joy just becomes greater when we all share in it!

  • "When an entire server can be rebuilt in the blink of an eye, the best response to a poorly behaving service is truly not to fix the issue."

    --which is great, until you keep having the same issue over and over and now you've replicated it to every server in you structure/organization. 

    --You also have to know when a server/service is having and issue so some action can be taken

    ----You doesn't imply a person here, perhaps the you is the automation you've build to tear down and rebuild. Something needs to      know when things have gone wrong.

    These thoughts in no way say that isn't a viable or even a good solution, but it's not just an "off-the-cuff" answer to solving problems in an environment.

Thwack - Symbolize TM, R, and C