Of all the different sets of IT life coaching tricks IT pros pick up, monitoring may be the most wide-ranging area and therefore the hardest to nail down in a single blog. After all, how many network techniques are equally valid here? And therein lies the actual secret—the tricks collected by monitoring experts have little to do with a specific technology or silo and much more to do with the philosophy and discipline of monitoring itself. So I’d like to present you with five simple but important hacks—techniques, concepts, or approaches—to help improve your monitoring game.
The old saying about a hammer and a nail is as true here as it is anywhere else in IT. If the only technique you know is ping, then you probably think “monitoring” is nothing more than uptime data presented in a cacti graph. The same goes for SNMP, WMI, and even API data retrieved using JSON.
Almost all “monitoring” is performed using a handful of techniques that haven’t changed much over the years. What HAS changed is the efficiency, speed, and sophistication of monitoring solutions in collecting and presenting data.
So, before you do anything else, make sure you’re familiar with this handful of techniques. That way you can be sure you’re always selecting the right tool for the right job.
This Monitoring 101 eBook is a good place to start.
Knowing the basics is like having a warm coat on a freezing day. But being able to clearly and concisely explain those basics to others is like building a bonfire.
Monitoring isn’t just some checkbox item you do to get an audit off your back—nor is it a black box service people ask for before walking away with a vague assurance you’re going to do SOMETHING. The teams who request monitoring want to know what you’re doing, how you’re doing it, and how it’ll make their lives better.
If you provide a jargon-filled response heavy on buzzwords and light on details, you’ll find yourself asked to justify your work at higher and higher levels of the organization until the inevitable decision is made: you (if not monitoring overall) aren’t needed because monitoring clearly provides no measurable value to the business.
This technique is about more than providing “Fisher-Price Presentations” (bright colors, small words, simple ideas, and lots of pictures). This is about embodying the quote from Albert Einstein:
“If you can’t explain it simply, you don’t understand it well enough.”
This eBook offers an example of explaining complex technical topics simply.
Over the years, I’ve had the chance to set up monitoring solutions for a variety of companies, and they’ve been targeted at a range of situations. A pattern of questions emerged among project stakeholders, departmental colleagues, and the consumers of the data my monitoring solution produced. I began to think of these questions as “The Four Questions (of Monitoring),” and over time, I’ve realized successful monitoring implementations are those capable of addressing these questions. I also noticed how the monitoring implementations that struggled often did so because they weren’t designed in a way that permitted operators to answer these same questions.
You know what else I’ve noticed? Most people who work in monitoring didn’t start out with it as a career goal. Most of us are—by and large—network engineers, SysAdmins, storage architects, or even infosec professionals. But during our workdays, we find ourselves wanting (or more likely needing) to know about the health of our environment. So we create or adopt tools to help us do this.
When other people notice what we’ve done, they ask us to do the same for them. This is the moment we make the leap from watching systems we care about, to monitoring systems other people care about. This is when we become monitoring engineers. And “The Four Questions” aren’t far behind.
The Four Questions of Monitoring are as follows:
- Why did I get an alert?
The person isn’t asking, “Why did this alert trigger at this time?” They’re asking why they got the alert at all.
- Why didn’t I get an alert?
Something happened, and the owner of the system felt it should’ve triggered an alert, but they didn’t receive one.
- What’s being monitored on my system?
They want to know what reports and data can be pulled for their system (and in what form) so they can look at trends, performance, and forensic information after a failure.
- What will alert on my system?
They’d like to be able to predict the conditions under which they’ll receive an alert for this system.
But wait! There’s a fifth Beatle—I mean question.
- What do you monitor “standard”?
What metrics and data are typically collected for systems like this?
The answers to these questions take up more space than this humble blog post has to offer. But the takeaway I want to hold onto right now is this: you need to anticipate and even look forward to these moments, because it means the person you’re talking to is interested rather than apathetic. From there, know these questions are coming your way tomorrow and let this inform your choices in building the solution today.
This video from THWACKcamp 2019 goes into greater detail about The Four (ok, Five) Questions
In the movie “Spiderman: Homecoming,” there’s a moment I don’t think IT folks pay enough attention to:
Tony Stark: Okay, it’s not working out. I’m gonna need the suit back.
Peter Parker: You don’t understand! This is all I have! I’m nothing without this suit!
Tony Stark: If you’re nothing without this suit, then you shouldn’t have it, okay?
Some IT folks frame their work based on tools rather than skills. They become “the resident XYZ app expert,” and their credibility is built on the foundation of how well they know XYZ app. If the app is replaced in the company, IT pros who choose this path find themselves either looking for work at other companies who use XYZ app or under pressure to figure out how to pivot what they knew in the old tool to the new one.
The technique, in this case, is to always make sure your skills are greater than your tool set. Understand how these tools are accomplishing the work they do. To this end, I’d remind monitoring engineers of the following:
- Ping is still useful
- Traceroute was never incredible to begin with
- The netstat command is a secret weapon
- Understanding packet captures (i.e., “Wireshark”) makes you a wizard in the eyes of most IT folks
Some might argue I just named a whole set of alternate (and more basic) tools rather than skills. But having the level of knowledge needed to fully utilize these (often command-line) utilities demonstrates you’ve mastered the concept and can get the same job done in almost any tool.
I’ve saved what may be the most important technique for last. Certainly, it’s the one I have to remind people about the most. Understand (and be able to explain) what monitoring is and is NOT. If you aren’t clear on this, you, your team, and your company will end up trying to fix monitoring when the actual problem is unreasonable expectations. Here it is, clear and simple:
Monitoring is nothing more (and nothing less) than the regular collection of data from a set of targets.
That’s it. That’s monitoring. Everything else—alerting, reporting, automation, and the rest—are happy by-products of having monitoring in the first place. How does this work itself out in the real world?
Issuing commands through an interface could be troubleshooting, investigating, or even researching. But don’t ask people to do it and call it “monitoring.” Don’t fool yourself into believing you have effective monitoring if this is what it looks like. No, not even if you use “tail.” No, not even if you have lots of screens in the NOC.
Which brings me to another point:
You will never have enough money to hire enough eyeballs to stare at screens and watch for a message or a blinking light to turn red for it to be called “monitoring.” Monitoring must be able to work when you’ve run out of eyeballs. “Swivel chair integration” (putting one thing on each screen and swinging your chair from side to side to see it all) isn’t a solution either. Stop. Just stop.
This brings me to my final point: though you (personally) need to be better than your tools, your team, organization, and business aren’t going to be much better at monitoring than your tools will allow. The right monitoring tool is a force multiplier—not just regarding monitoring activities but with respect to everything your business does or should be capable of doing.
Monitoring, along with everything that goes with it (alerting, automated responses, reporting, and the democratization of performance and operational information), is the secret sauce for many businesses. It sits at the heart of an organization’s ability to know what’s working, what isn’t, and how changes affect the bottom line.
And that’s a technique everyone should know.