Troubleshooting efficiency and effectiveness are core to uncovering the root cause of incidents and bad events in any data center environment. In my previous post about the troubleshooting radius and the IT seagull, troubleshooting efficacy is the key performance indicator in fixing it fast. But troubleshooting is an avenue that IT pros dare not to walk too often for fear of being blamed for being incompetent or incorrect.

 

We still need to be right a lot more than we are wrong. Our profession does not give quarters when things go wrong. The blame game anyone? When I joined IT operations many a years ago, one of my first mentors gave me some sage advice from his own IT journey. It’s similar to the three envelope CEO story that many IT pros have heard before.

  1. When you run into your first major (if you can’t solve it, you’ll be fired) problem, open the first envelope. The first envelope’s message is easy – blame your predecessor.
  2. When you run into the second major problem, open the second envelope. Its message is simply – reorganize i.e. change something whether it’s your role or your team.
  3. When you run into the third major problem, open the third envelope. Its message is to prepare three envelopes for your successor because you’re changing company willingly or unwillingly.  

 

A lifetime of troubleshooting comes with its ups and downs. Looking back, it has provided many an opportunity to change my career trajectory. For instance, troubleshooting the lack of performance boost from a technology invented by the number one global software vendor almost cost me my job; but it also re-defined me as a professional. I learned to stand up for myself professionally. As Agent Carter states, "Compromise where you can. And where you can’t, don’t. Even if everyone is telling you that something wrong is something right, even if the whole world is telling you to move. It is your duty to plant yourself like a tree, look them in the eye and say, no. You move." And I was right.

 

It’s interesting to look back, examine the events and associated time-series data to see how close to the root cause signal I got before being mired in the noise or vice-versa. The root cause of troubleshooting this IT career is one that I’m addicted to, whether it’s the change and the opportunity or all the gains through all the pains.

 

Share your career stories and how troubleshooting mishap or gold brought you shame or fame below in the comment section.