cancel
Showing results for 
Search instead for 
Did you mean: 

The One Where We Abstract a Thing

Level 11

There are many books, websites, and probably self-help videos devoted to teaching or explaining the art of troubleshooting. Most are specific to an industry, and further to a problem domain within that industry. Within each problem domain, within each industry, within each methodology, there are tools of the trade designed to help you solve whatever problem is vexing you at that moment. The specificity of all of this, however, can be abstracted out of these insular, domain-specific modalities to affect a greater understanding of the role of troubleshooting in general.

It goes without saying that you cannot find something that you do not know you are looking for, and yet this is what a lot of neophyte engineers instinctively try. “The phones are down” may seem like the problem you need to fix, but counterintuitively that is only a symptom of the real problem. The real problem, the one causing the phones to be down, lies elsewhere. While you run around trying to figure out what’s up with the phones, what you should be thinking is, “For what reason(s) is/are the phones ‘down’?” and move from there. For example, are all the phones down? Some? Are there other symptoms? And what has changed recently, if anything? Once you’ve worked through some of this, which may only take seconds or minutes for a seasoned engineer, you’re more prepared to move onto the next steps.

Analyzing the problem(s), or problem statements, will help you to form some hypothesis as to where the problem is likely to lie. Now, how can you begin testing your ideas to see if you are on the right track? Well, in the IT world that we all live in (I know, I said abstracted…), you’re going to need information. Information gathering can be a manual process, and in many cases must be, but having good tools at your disposal can certainly help the process along the way, especially when you are shooting in the dark, so to say. Again, if you don’t know what you don’t know, an automated and impartial tool can help.

Tool impartiality is often overlooked as a step in the discovery phase of troubleshooting any problem. Plumbers have scopes to look inside of pipes that they cannot see; electricians have multi-meters to help them test connectivity, resistance, etc.; and you as an IT professional have tools like PerfStack. A tool like this happily gathers information from all of your systems, jumping to no conclusions, and can call out abnormalities in the steady state of a system. Where many engineers skip straight to the “trying to fix anything they suspect is the problem” phase, PerfStack simply presents what it sees in an impartial and authoritative manner. From its dashboards, an engineer can begin his/her search from a position of knowledge. Combine that with the wisdom that comes from experience, and you have a very strong team.

Mean time to innocence (MTTI) is a somewhat tongue-in-cheek metric in IT shops these days, referring to the amount of time it takes an engineer to prove that the domain for which they have responsibility is not, in fact, the cause of whatever problem is being investigated. In order to quantify an assessment of innocence you need information, documentation that the problem is not yours, even if you cannot say with any certainty who does own the problem. To do this, you need a tool that can generate impersonal, authoritative proof you can stand on, and which other engineers will respect. This is certainly helped if a system-wide tool, trusted by all parties, is a major contributor to this documentation.

A tool like PerfStack will certainly help in getting buy-off from the pointy-haired bosses as to what needs to happen to fix whatever needs fixing. Most organizations have a change control process--though likely an amended one during any kind of outage—and documentation is always a part of that. And all of this stuff, this paper trail from beginning to end, flows together nicely right into the final package that many organizations require for a post-mortem. Engineers and management can get through an after-the-fact incident meeting much quicker, and with likely consensus, with a clean and robust set of documents.

At the end of the day, troubleshooting is an art no matter what you do, where you do it, or in what industry you live. The methodologies are largely the same at a macro level, as are the need for quality tools. Can a great engineer find the root cause of a problem without a comprehensive tool like PerfStack? Sure. A cobbled together band of point tools has always been a part of the engineer’s toolkit and likely always will be, at least until our new sentient robotic overlords obviate the need for that. But a full-scale, system-wide solution like PerfStack should also be a part of any well-stocked engineering team’s process. After all, it can help find those things you do not yet know you are looking for.

13 Comments
vinay.by
Level 16

Nice article

tallyrich
Level 15

Good article. I like the old phrase you are only as good as your tools, but when it comes to troubleshooting the tool make it easier to find what you are looking for, but the underlying (and I like your word Art) skills are more intuitive than tools.

shuckyshark
Level 13

Love MTTI !!!!!!!

ecklerwr1
Level 19

I've never heard of MTTI... perfstack on the other hand.  We know that!!!

zero_cool
Level 10

PerfStack is amazing!

tallyrich
Level 15

Would that be:

MTTI - Career Training School - Rhode Island and Massachusetts www.mtti.edu

Or:

Massage Therapy Training Institute: NM MTTI mtti.org

I'm thinking I could love some Massage Therapy about now.

tinmann0715
Level 16

Some of the intangibles that come into play are:

  • Experience - this cannot be weighted heavily enough. "The phones are down!" An experiences phones admin knows where to look first during the troubleshooting process.
  • Head count - "The phones are down!" With head count one person can look at the switch(es), another at the phone controller, another into SolarWinds, another can organize the troubleshooting.
rschroeder
Level 21

MTTI is good, but is only a crutch if you don't know where to send the problem ticket  for resolution.  PerfStack and other Orion tools make tossing that hot potato to others MUCH more accurate and efficient. 

The single-pane-of-glass / no-silo environment makes getting the right team looking at the problem quickly easier, rather than forwarding the problem to your best-guess next support team, and then hoping they can either fix it or figure out who to send it to next.

What might be a problem handed to you . . .

pastedImage_0.png

Shouldn't be randomly passed to others on a "best guess" basis . . .

pastedImage_1.png

Lest the customer's needs not be met in a timely manner.  In which case that potato could come back to you . . .

pastedImage_2.png

mtgilmore1
Level 13

Love it - Pertstack

designerfx
Level 16

Do they teach data massaging?

pattic
Level 9

Pretty deep for 4 in the morning.

shuckyshark
Level 13

yes, I could definitely use some MTTI right about now.

byrona
Level 21

When it has become standard fare for our industry to accept and joke about the passing of blame from one discipline group to another I can't help but think that the model is wrong; I suggested another approach in my comment on a discussion HERE.  Would it make more sense to have a non-biased team with a member of each discipline (networking, systems, storage etc.) where the teams primary responsibility is troubleshooting?  We all acknowledge that troubleshooting is a skill all on it's own and we also acknowledge the problem with the current model where blame gets passed from one team to another so why not change the equation by changing how we structure our teams?

About the Author
Life-long and professional Network, VMware, and Unix Geek; Whiskey Taster; Brain Hacker; Student of Everything. Cancer Survivor. Armchair theoretical physicist.