cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Stop And Smell The Documentation

Level 11

Stop And Smell The Documentation

A recent issue with a network reminded me of the importance of documentation. I was helping a friend find out why destinations in the core of the network were unable to ping other locations. We took the time to solve some routing neighbor issues but couldn't figure out why none of the core could get out to the Internet. We were both confused and working through any issues in the network. After a bit more troubleshooting with his team, it turned out to be a firewall issue. In the process of helping the network team, someone had added a rule to the firewall that blocked the core from getting out. A lot of brainpower was wasted because this engineer was trying to help.

We reinforce the idea that documentation is imperative. As-built documentation is delivered when a solution is put together. Operational docs are delivered when the solution is ready to be turned up. We have backup plans, disaster plans, upgrade procedures, migration guidelines, and even a plan to take equipment out of service when it reaches the end of life. But all of these plans, while important, are the result of an entire process. What isn't captured is the process itself. This becomes very important when you are troubleshooting a problem.

When I worked on a national helpdesk doing basic system support, we used the CAR method of documentation:

  • C - Cause: What do we think caused the problem? Often, this was one of the last things filled in because we didn't want to taint out troubleshooting method with wild guesses. Often I would put in "doesn't work" or "broken".
  • A - Actions: This was the bulk of the documentation. What did you do to try and fix the problem? I'll expand on this in a minute, but Actions was the most critical part of the documentation that almost never captured what was needed.
  • R - Resolution: Did you fix the problem? If not, why? How was the customer left? If you were part of a multiple call process, Resolution needed to reflect where you were in the process so the next support technician could pick up where you left off.

Cause and Resolution are easy and usually just one or two line entries. What broke and how did you fix it? Actions, on the other hand, was usually a spartan region of half sentences and jumbled thoughts. And this is where most of the troubleshooting problems occurred.

When you're trying to fix a problem, it's tempting to only write down the things that worked. Why record things that failed to fix the problem? If you try something and it doesn't affect the issue, just move on to the next attempt. However, in the real world, not recording the attempts to fix the problem are just as detrimental as the issue in the first place.

Writing In The Real World

Let's take the above example. We were concentrating on the network and all the issues there. No one thought to look at the firewall until we found out the issue was with outbound traffic. No one mentioned there was a new rule in the firewall that directly affected traffic flow. No one wrote down that they put the rule in the firewall just for this issue, which made is less apparent how long the rule had been there.

Best practice says to document your network. Common practice says to write down as much as you can think of. But common sense practice is that you should write down everything you've done to during troubleshooting. If you swap a cable or change a port description it should be documented. If you tweak a little setting in the corner of the network or you delete a file it should be noted somewhere. Write down everything and decide what to keep later.

The reason for writing it all down is because troubleshooting is rarely a clean process. Fixing one problem will often uncover other issues. Without knowing the path we took to get from problem to solution, we can't know which of these issues were introduced by our own meddling and which issues were already there. Without an audit trail we could end up chasing our own tails for hours not knowing that the little setting we tweaked five minutes in caused the big issue three hours later.

It doesn't matter if you write down your steps on a tablet or jot them on the back of a receipt for lunch. What is crucial is that all that information makes it to a central location by the end of the process. You need great tools, like the ones from Solarwinds, to help you make sense of all these changes and correlate them into solutions to problems. And for those of us, like me, that often forget to write down every little step, Solarwinds makes log capture programs that can let the device report every command that was entered. It's another way to make sure someone is writing it all down.

How do you document? What are you favorite methods? Have you ever caused a problem in the process of fixing another one? Let me know in the comments!

27 Comments
MVP
MVP

I do agree on the need to document what you are doing during troubleshooting as you may miss something when you are trying to go back and review the entire incident.  This is especially true at 3AM.  You may not be awake as you think you are.....

Documentation is the key.

Whether it is comments in code or a knowledge base in sharepoint, it helps to  not reinvent the wheel and also to not skip over the easy obvious fix.

MVP
MVP

I'm with the other guys on this. Documentation makes all our lives a lot easier.

One thinks of documentation/instructions as something done for others so they can follow your logic but when you've been doing this for a couple years (20+) one realizes that it's just as important to document for oneself. Who knows if you'll remember what you were thinking years ago when you put something together right quick on a Monday morning after a rough weekend.

I use good 'ole Notepad++ ~(other methods of making notes are available).

I kick off a new text file as soon as I've got to site, and jot down every step I take in getting to the objective of the day. I then file it away under the relevant client, putting the date in the filename so it's easy to find it in the future.

I do create other, more impressive documentation at key points in a project, when the platform is built, when the software has been deployed, when the environment is ready to accept the pilot nodes etc, but I find simple text files to be key brain dumps.

Level 14

Documentation is the best way to capture new stuff on the fly and make you look like a hero when you need to do something quickly when your boss shows up... Plus in certain businesses auditors love it!

I refer to the documentation I create as "Dick and Jane" basically anyone with a pulse can follow the steps.....Picture, arrows, screen shots, bolding, italicizing.... whatever it takes....

It's even helped me a few times months later on some obscure task or issue....

I'll go out on a limb and tentatively mention that I think everyone missed the one item that would have caught and prevented this, or revealed it before it became a problem:  Change Management.

It's all about sharing the information for every item that might cause a call to the Help Desk, or might cause cycles to be wasted troubleshooting.  If every firewall rule created or modified or deleted is submitted for approval by the Change Control Team, and if the Change Control meeting is attended by members of every IS Team (and those members have to be attentive, not doing other work wirelessly), then the board would have asked these questions:

  • What change are you making to the network or firewall?
  • Why are you making this change?
  • What benefits does it provide, and what drawbacks will accompany it?
  • Did you share this plan with all your team members before bringing it to Change Management for Review and Approval?
  • What are the technical steps for accomplishing the work?
  • What are the steps for backing out of the change to recover original functionality in case of unexpected results?
  • Will there be any outage to anyone?
  • How long will the outage last?
  • What individuals will be affected?
  • What message should be sent to them?
  • How long in advance should they be notified?
MVP
MVP

the biggest problem most documentation efforts have is that they concentrate on the document - not the process used to create it.

It's like security, documentation isn't something you tack onto the end - for it to be useful it must be part of the process to deliver the system or service you are providing.

When it is an integral part of the deliverable, then it can be relied upon by other systems and integrated into other processes, which in turn makes it more helpful and more likely to be maintained.

The alternative is that a doc is written to cover current state and then rots in the corner while the world around it constantly changes.

MVP
MVP

I missed your post before posting my reply!

This is exactly what I mean, without documentation updates being part of the change management process it becomes a static document

MVP
MVP

I was thinking the same thing. Change management needs to be part of the documentation process.

MVP
MVP

Documentation can be very helpful at times. Other times it can be a waste of time. If in the above example they started looking through documentation, would they have found the answer was a rule in the firewall? Maybe, maybe not.

One other issue is that at my work place, security is a different team. So I would never see any documentations on what changes they are making. Nor am I part of their change management process.

I think documentation for doing certain tasks is vital. But some network documentation can be a waste of time if it's not consistently updated. When trouble shooting, I may look at a network diagram (how things are connected to each other). But to go any deeper, I would take a physical look rather that trolling through documentation.

Level 12

yes document can be use any time. it is use for sharing information to other users in simple way.

MVP
MVP

Documentation only works if

1) it is written down.

2) it is published where others can get to it when needed.

3) it is 3 AM friendly.

4) others know it exists.

5) others can get to it.

6) people actually refer to it.

7) it is kept up to date.

Absolutely Jfrazier​! Documentation is about as much use as a chocolate teapot if it isn't kept up to date, and therefore "relevant at 3am"

Level 14

Yep.  Documentation.  Unfortunately, we don't always document.  It takes a great deal of time to document as you are troubleshooting.  Especially if it is a critical issue.  We always seem to document what we remember after the fact.  And when you have documentation on something, it always has to be modified as things always change.  The best places have people whose sole purpose is to create documentation. 

Level 17

Documentation saves Jobs along with numerous hours of lost time and increased stress. Less stress means less hair falling out. More hair means not blinding that driver on the interstate with your Cue Ball Head. Averting a major Interstate pile up. So in conclusion, documentation saves Lives!

The key to documentation is keeping is simple, concise and very direct and to the point. No opinions are needed in any documentation. And if there is any process that has Production Killing Possibilities - Mark it Up in BIG Plain Bold, and possibly RED letter. I've been a part of teams where the uncertainty or talking with a Salesman while installing something precious on their laptop is too distracting for the tech/engineer and the wrong option checked. So during an app install, the app starts reaching up to the SQL server and defaulting all the templates to the app default instead of retaining the organizations customized templates that fit the process of Sales to Production and Delivery. The documentation that I created in the after math of that contractors bluff could have been 2 pages - but was 3 will the full middle page dedicated to selecting this correct option during installation.

MVP
MVP

I agree it is a pain to document while you are troubleshooting...but when it comes to recalling steps later or to put into your RCA (root cause analysis) report as to what you did, you''l be grateful.  Sometimes that becomes a procedure...it sucks to be following a procedure and a small crucial step has been left out.  It also helps you to back out something you did during troubleshooting as you have a roadmap of what you did, commands entered, etc.

Level 20

IDK about you guys but I really hate documentation as much as I need it.   People usually talk a good game about this but in practice don't always practice what they preach from my own experience.  I agree with goodzhere good companies either budget for the time to do documentation or like he said have people that make their entire jobs out of doing documentation.  We were talking about helpdesk ticketing the other day... take full blown BMC Remedy implemented it has modules for everything including change management and knowledge base stuff... you can associate all of these things together... personally I can't stand it but I do some of it.  Those really bad ones like cahunt mentioned some of those I've taken the time to write knowledge base articles for.  Generally though I still have lots of plain text files on unix/linux and excel, word, and visio drawing on windows I use for a lot of it.  In many places it's going from putting one fire out to another which begets the I'll document that maybe later...

Ahh documentation! The bane of a dynamic IT operation's existence. I refer back to the old saying, "As soon as you write it down it becomes obsolete."

   My attempts to mitigate the documentation needs is to: automate, streamline, consolidate, simplify, and centralize.... everywhere, with everything that I can possibly imagine. Hardware, software, network, maintenance contracts, vendors, etc. Less is better when it comes to documentation.

Level 12

Agree completely with all of these points!

Level 12

The necessary Network EVIL!  Necessary being the key word here....

Level 14

In the Incident Handling world, if you are moving too fast to take good notes, then you are moving too fast. 

I learned my troubleshooting techniques as an Electronics Technician for the US Navy.  We were taught the six step troubleshooting technique.

1. Symptom recognition

2. Symptom elaboration

3. Listing of probable faulty functions

4. Localizing the faulty function

5. Localizing trouble to a faulty circuit/component

6. Failure analysis

We would document every step, every voltage or signal check whether good or bad.  When the problem was fixed, the final step was failure analysis.  What worked and why.  What didn't and why.

On the flip side, when underway you couldn't always get the exact parts you needed.  Sometimes you would need to "just make it work" until the correct parts arrived.  These "fixes" were referred to as underway field changes.  Sometimes they got documented and sometimes not.  It was always fun to find one that wasn't documented while troubleshooting a new problem.

We've assumed that documentation will be useful, but we've missed how important it is to learn how to write USEFUL INFORMATION.

Poor grammar, incomplete thoughts, missing punctuation--these things (and more) may derail the benefits of documentation.

Do not merely document; take a Business Writing or Technical Writing class and LEARN how to communicate your content correctly and effectively.

You'll thank yourself, and so will your team.

Level 14

Exactly.  Big part of my daily job.

Level 21

Great article networkingnerd​!  It almost feels like there is an untaped market here.  On P1 incidents most if not all of my team ends up working to resolve the issue which makes the challenge of tracking the changes being made even more difficult as there are a lot of people involved.  It seems like this would be a good function of a group collaboration tool or ticketing system.  Some system that would allow multiple people on a team to be noting the changes they are making so that the other people on the team can see that information real-time and when the issue is resolved the entire transcript is turned into documentation.  Anyway, this is just me thinking out loud on the matter, I would love to hear feedback.

MVP
MVP

Same for me, going back to when I was a teenager doing phone tech support around the turn of the century .  As soon as a call started I was typing everything they said into the notepad.  Originally so I wouldnt have to ask them to repeat themselves later on but eventually I started saving them and adding more useful meta data as i wrote so I could review my ticket times and such and independently assess my performance.  Documenting things as you go is major.

Level 15

I did not have the habit of reporting procedures, culture thing in small emrpesas that worked. But after I started working at my company I have seen how important you have a manual of good practices and procedures. Help in the incidents and accelerates adaptation of the travel company beginners

Level 15

small company*

About the Author
A nerd that happens to live and breathe networking of all kinds. Also known to dip into voice, security, wireless, and servers from time to time. Warning - snark abounds.