Saturday, October 08, 2011

Learn to be a better troubleshooter



The very best technical talents often have massive troubleshooting chops. But troubleshooting isn't inherently a technical skill; it's a set of tools to achieve clear thinking and knowledge. 


This is science!


For your consideration:  the clearest expositions of technical troubleshooting strategies and tactics since Sun Tzu did it for war.


In no particular order, here are links and a few choice excerpts.


ESR, the ninja-slicing, recursive-software-naming, Free Software advocate who is a key figure in the culture of open-source, wrote one of the foundational documents of hackerdom. As of this writing, it's at version 3.7, last updated December 2010. The beauty of it is, in telling you how to ask questions the smart way, it also teaches you troubleshooting.  




How To Ask Questions The Smart Way
Eric Steven Raymond
  • Be precise and informative about your problem
  • Describe the symptoms of your problem or bug carefully and clearly.
  • Describe the environment in which it occurs (machine, OS, application, whatever). Provide your vendor's distribution and release level (e.g.: â€œFedora Core 7”“Slackware 9.1”, etc.).
  • Describe the research you did to try and understand the problem before you asked the question.
  • Describe the diagnostic steps you took to try and pin down the problem yourself before you asked the question.
  • Describe any possibly relevant recent changes in your computer or software configuration.
  • If at all possible, provide a way to reproduce the problem in a controlled environment.
Do the best you can to anticipate the questions a hacker will ask, and answer them in advance in your request for help. 
Giving hackers the ability to reproduce the problem in a controlled environment is especially important if you are reporting something you think is a bug in code. When you do this, your odds of getting a useful answer and the speed with which you are likely to get that answer both improve tremendously.


ESR also says: 


Simon Tatham has written an excellent essay entitled How to Report Bugs Effectively. I strongly recommend that you read it.

I agree! Check it out:

  • The first aim of a bug report is to let the programmer see the failure with their own eyes. If you can't be with them to make it fail in front of them, give them detailed instructions so that they can make it fail for themselves.
  • In case the first aim doesn't succeed, and the programmer can't see it failing themselves, the second aim of a bug report is to describe what went wrong. Describe everything in detail. State what you saw, and also state what you expected to see. Write down the error messages, especially if they have numbers in.
  • When your computer does something unexpected, freeze. Do nothing until you're calm, and don't do anything that you think might be dangerous.
  • By all means try to diagnose the fault yourself if you think you can, but if you do, you should still report the symptoms as well.
  • Be ready to provide extra information if the programmer needs it. If they didn't need it, they wouldn't be asking for it. They aren't being deliberately awkward. Have version numbers at your fingertips, because they will probably be needed.
  • Write clearly. Say what you mean, and make sure it can't be misinterpreted.
  • Above all, be precise. Programmers like precision.

But the first place I send people when I want them to understand what I'd like to get as a good bug report is Joel Spolsky's story of Jane, the very, very good software tester. 

It's pretty easy to remember the rule for a good bug report. Every good bug report needs exactly three things.
  1. Steps to reproduce,
  2. What you expected to see, and
  3. What you saw instead.
Seems easy, right? Maybe not. As a programmer, people regularly assign me bugs where they left out one piece or another.
If you don't tell me how to repro the bug, I probably will have no idea what you are talking about. "The program crashed and left a smelly turd-like object on the desk." That's nice, honey. I can't do anything about it unless you tell me what you were doing.

If you don't specify what you expected to see, I may not understand why this is a bug. The splash screen has blood on it. So what? I cut my fingers when I was coding it. What did you expect? Ah, you say that the spec required no blood! Now I understand why you consider this a bug.
Part three. What you saw instead. If you don't tell me this, I don't know what the bug is. That one is kind of obvious.

Check out this book: Are Your Lights On: How to find out what the problem really is, by Gause and Weinberg. They wrote the book on requirements, too. 



Here's some random guy's 2 minute video review of it: 



KB555375 might be Microsoft's best KB article of all time - but by all means, if you know a better one, say so in the comments.


Microsoft Support Knowledgebase Article ID: 555375 - Last Review: July 22, 2005 - Revision: 1.0
How to ask a question 
Author: Daniel Petri MVP
Good examples of questions will include information from most of the following categories: - What are you trying to do?- Why are you trying to do it?- What did you try already, why, and what was the result of your actions?- What was the exact error message that you received?- How long have you been experiencing this problem?- Have you searched the relevant forum/newsgroup archives?- Have you searched for any tools or KB articles or any other resources?- Have you recently installed or uninstalled any software or hardware?- What changes were made to the system between the time everything last worked and when you noticed the problem? Don't let us assume, tell us right at the beginning.
In fact, if you know of ANY other top-notch sources of troubleshooting wisdom, put a link in the comments!


There's one I'm trying to find that I had as a mousepad - it was about 10 troubleshooting tips - one of them was something like "Problems don't just go away on their own. If you haven't fixed the problem, the problem isn't fixed." Anybody know what that's from?


(I'll try to fix the formatting on this post later, ok?)

2 comments:

  1. I have a co-worker who calls me up and asks things like "do you know why Microsoft Word stopped working?" or better yet, "do you know why I'm having a weird problem with Microsoft Word?" The only answer I can give is "I don't know yet".

    ReplyDelete
  2. The maxim is "If you didn't fix it, it ain't fixed" and the source is David Agans' essential book Debugging.

    ReplyDelete