The Art of Troubleshooting

I wrote about visibility via monitoring being the first step in successful IT change management. And as an IT Pro’s career progresses, they will encounter many breaks and failures in their IT infrastructure. The only guarantee in IT is that something will break and IT pros have to be able to fix it ASAP. Experience and a solid process framework, coupled with visibility are key to successfully troubleshooting IT issues.

Troubleshooting is a skill that consists of two parts: root-cause analysis and taking corrective measures. In the past, troubleshooting would include:

  1. Reading the fabulous manual (RTFM)
  2. Working with wonderful vendors… post sales
  3. Patching together a keep-the-lights-on solution with putty and duct tape
  4. Or leveraging the reboot all systems and leave it in the hands of FM – fate & magic

Fast forward to today, and troubleshooting is all about collaboration i.e. someone has probably already ran into this issue and has blogged about it or shared the knowledge on an IT community website like thwack. So troubleshooting becomes as simple as Google-ing it or Bing – winner, winner, chicken dinner.

But what if you are the first to encounter a problem? Then, you’ll need a framework to troubleshoot issues. If you don’t have one, here’s a template framework that you can leverage. And within that framework root-cause analysis begins with what is happening (a real-time dashboard) and what has happened (logs). Once the problem is identified and cause-effect is understood, the prescriptive measures can be determined, tested, verified as viable fixes, and deployed into production. Troubleshooting success consists of the efficiency and effectiveness of the resolution.

In closing, troubleshooting is a constantly evolving skill for an IT pro. When you think you’ve mastered your environment, new technology always intervenes. So learn the art of troubleshooting like your career depends on it.


Let me know what you think in the comment section below. Also feel free to share your troubleshooting process or tips below.

  • Thanks for chiming in and sharing with the Community! Excellent points on the scientific (1-thing at a time) method and experience. It is a constant cycle for IT admins.

  • The scientific method is always your friend.  It also helps to keep track of each detail, making only one change at a time, documenting that change, and then repeating.  After some experience, you will learn to identify the "variables" that can go wrong with a situation and apply one-off analysis to grow your experience and your skills.  Don't be afraid to be involved with all kinds of troubleshooting.  The greater the quantity of issues the greater the quantity of experience gained.  Never ending cycle.

Thwack - Symbolize TM, R, and C