I’ve always been pretty good at debugging. Until a couple of years ago I’d never thought much about why I find it easy, but once I realised that I didn’t know why I was good at something, I had to know. So I dedicated some time to analysing my own internal, instinctive thought process, and from what I’ve observed it can be reduced to this:
- Identify assumptions relevant to the issue
- Pick the assumption which, if false, would most likely cause the issue
- Run an experiment that can prove that, in the current context, the assumption is either true (and move to the next assumption) or false (problem probably solved)
If you’ve spent any amount of time developing software, you’ll know that at the discovery of almost every bug the first thing you think is: “but that shouldn’t be possible”. Obviously it is possible, because it’s happening, and the knowledge you have that leads you to believe it’s not possible is really just wrong – it’s an assumption about the way things work when they don’t work that way. And so it seems straight foward that the key to finding the cause of many bugs is often to challenge your own perception of what’s possible.
Step 1 is usually the hardest part. It requires lots of practice to become proficient at recognising the assumptions that most people knowledgeable on the matter (including yourself) believe to be true, but which have not been explicitly tested. This is especially true because often the assumption may have proved true in many other circumstances. If someone else asks you to help them solve a problem, sometimes asking questions like, “Tell me why you think this shouldn’t be happening,” will help them to verbalise the assumptions that are the source of their impediment.
I had some great success with this method earlier in the year when we had a fairly worring bug appear in production. After reviewing the facts for a couple of minutes, I thought that the most likely assumption to cause the problem, if it proved to be false, was this: MySQL only returns each row once when you query on a single table. Of course it does, doesn’t it? Developers around the world have seen it exhibiting that exact behaviour for almost 20 years. But I decided that we shouldn’t trust MySQL, I ran an experiment, and I proved there was a critical bug in a GA version of MySQL. Now if only they’d get around to fixing it.
So that’s my method. It works pretty well for me, but I’m sure it isn’t the only way to efficiently narrow in on confusing defects. Do you have your own debugging secret sauce?
Image credit: 124/365 Tick by Micah Taylor