As 2015 ends, businesses are busy closing deals, evaluating project success, and planning for the New Year. For IT professionals, this transitional period is crucial in building a foundation for success in the upcoming year. This year, I resolve to keep IT stupid simple. It’s guaranteed to KISS away all IT issues.


A Walk in the Clouds, containers and loosely coupled services

It’s easy to get lost in the myriad of new technologies and the associated vendor FUD (fear, uncertainty, and doubt) that fills the IT landscape. It certainly doesn’t make an IT professional’s job any easier when it’s hard to discern between what’s fact or fiction. Especially when one can solve a problem in so many different ways.

Ultimately, what’s the most efficient and effective method to troubleshoot and remediate problems?

Keep IT Stupid Simple

So let’s start with the obviouskeeping IT stupid simple. This means that if you don’t understand the ins and outs of your solution stack; then, it shouldn’t be your solution. When an application or system slows down, breaks or fails (and it will), your job is on the line to quickly root cause and resolve the issue. Keeping IT stupid simple makes troubleshooting and remediation much easier.

The USE Method

For performance bottlenecks, a great and simple framework to follow is the USE Method by Netflix’s Brendan Gregg.

USE stands for utilization, saturation, and errors. Each aspect is defined below:

  1. Utilization – the average time that a resource was busy working
  2. Saturation – the degree to which the resource can no longer do the work often resulting in queue lengths
  3. Errors – count of error events

Think of it as a checklist to keep things simple when troubleshooting and remediating issues. For every key resource, check the utilization, saturation, and errors. These aspects are all interconnecting and provide different clues in identifying bottlenecks.

The complete picture

Utilization covers what’s happening over time, but depending on sampling rate and incident intervals, it may not provide the complete picture. Saturation provides insights on overloaded conditions, but may not show up by just viewing utilization metrics for the reasons mentioned above. And errors give clues on operations that have failed and lead to retries. Combining all three can provide a clear view of what is happening during a performance bottleneck condition.

A proper monitoring tool will collect, aggregate, and visualize utilization while alerting on saturation conditions and logging and correlating errors.

A virtualization example of USE Method

Let’s walk through a simple graphical virtualization example utilizing the USE method focused on one resource metric.

  • Utilization – Let’s examine on the Host CPU utilization of Host – bas-esx-02.lab.tex, which shows a 98% utilization.


Figure 1

  • Saturation – Next, let’s dig in to verify if there was a triggered alert that indicates a potential saturation condition on the Host CPU utilization resource.


Figure 2

  • Errors – Finally, let’s see if this saturated event had any bearing on the host’s availability. And it appears that there was a time window 2 days and 8 hours ago relative to these screen captures that the host server had some availability issues.


Figure 3

The USE method paired with a proper virtualization tool epitomizes the keep IT stupid simple principles as a mean to troubleshoot any potential bottlenecks.


What is your resolution?

So in the upcoming year, what do you resolve to do to complete your data center picture whether they reside on your premises or on cloud? And what tools will you use or do you need to get it done right? Please chime in below in the comment section.

And join the 2016 IT Resolutions Contest!