In the process of developing applications, bugs will inevitably be introduced (“I code, therefore I create bugs” ~ Descartes, probably). Bugs can be introduced for a variety of reasons, such as logical errors, misunderstanding of requirements, lack of tests, tight deadlines, or something as simple as having an off day as a fallible human. However, knowing that bugs are inevitable, we can arm ourselves with tools to quickly identify and address software bugs before they are released in the wild.
Many people assume that they already know everything that there is to know about finding and fixing bugs, or that debugging can be an afterthought since they believe that feature development should take priority. The reality is that it is all too common to approach debugging with a haphazard, ineffective approach. Haphazard approaches can lead to frustration and wasted time, and can actually prevent you from working on new features for your users.
In this post, we will break down what debugging is as well as one approach that you can employ while you are developing software applications. Additionally, we will add more tools to your debugging toolbox that you may not be aware of. We will also review some interesting bug case studies, noting how the authors employed some debugging techniques in their approaches. This post assumes some familiarity with programming concepts.
What Is Debugging?
If we accept that creating software inevitably leads to the creation of bugs, then we must accept that some of the time taken in software development must be allocated to maintaining and debugging existing code.
Debugging is the process of locating, identifying, and fixing bugs. Even though a test might reveal the presence of a bug, it will not tell us what the exact error is or how the code needs to be fixed. Oftentimes, developers will approach debugging in a sub-optimal way: randomly opening files in the codebase in the hopes of finding out where the issue is coming from; changing lines of codes in a seemingly random manner and restarting servers in the hopes that their changes have fixed the issue; or worse yet, being paralyzed into thinking that the code should not be touched further in fear of causing other unintended consequences.
Why Is Debugging Important?
With great software comes great responsibility. It is a disservice to our users to ship products with glaring, obtrusive bugs, as these bugs can lead to unexpected results. Depending on the industry that you are operating in, software bugs can lead to financial losses, reputational damage, loss of trust from your users, or personal injury, or they can live in infamy for causing a security vulnerability.
The following famous bugs demonstrate the importance of squashing bugs before they are shipped out to users:
- NASA Mars Polar Lander — the lander was destroyed because its flight software mistook vibrations caused by the deployment of the stowed legs for evidence that the vehicle had landed, and shut off the engines 40 meters from the Martian surface. The lander had traveled approximately 79.5 million miles up to that point, only to crash and fail at the end. This resulted in financial damages of $175 million and a failed mission.
- Therac-25 — a computer-controlled radiation therapy machine that, due to several software bugs, incorrectly administered massive overdoses of radiation, resulting in the deaths of several patients. A quintessential case study of the potentially fatal dangers of engineers’ overconfidence.
- Knight Capital Group — Knight Capital’s systems incorrectly executed trade orders due to a repurposed software flag that triggered defective code. This resulted in a loss of $440 million.
- “MissingNo” Pokemon — a glitch Pokémon species present in Pokémon Red and Blue, which can be encountered by performing a particular sequence of seemingly unrelated actions. Capturing this Pokémon may corrupt the game’s data.
** A ** Debugging Approach
Below, I will break down my personal debugging approach. I lean on principles that I have learned in other engineering disciplines, years of observation while pairing with other developers, and utilizing resources on how to improve troubleshooting skills.
It is important to note that this post assumes that we have not already prevented and detected certain software bugs with error handling, testing, linting, static type checking, proper code formatting, and additional assistive tooling.
I like to follow the scientific method approach when I am debugging:
- Make an observation
- Gather information
- Make a hypothesis
- Test my hypothesis
- Analyze if my test is or is not working
- Repeat until bug is fixed
For step 1, making an observation, I first check to see if the buggy behavior still exists, or if it is actually a bug at all — bug reports might be incorrect or users might be reporting an issue inaccurately. At times, I will pick up a bug ticket but when I go to verify the buggy behavior, the bug has already been fixed by other work. It’s important to verify that the behavior still exists before spending time on an unnecessary fix. If the buggy behavior is still present, I verify that it also occurs in my local development environment. This helps eliminate any subtle issues that could be present only in production due to differences in environment configuration.
I find that the bulk of the work should be done in step 2: gathering information. This is where it is useful to know what tools you have at your disposal, including:
- VS Code Debugger (or the Chrome Debugger, or the debugger in the editor of your choice)
- Chrome dev tools
- React dev tools
- Redux dev tools
- Service logs, terminal logs
- Your peers
- Rubber ducks
As developers, we are lucky that there are a variety of tools to help us with data collection. Debuggers, logs, and additional dev tools should all be utilized to track down as much information as possible about the bug. They can really help narrow down the scope of the problem. I would implore anybody reading this post to explore these tools in more depth, as updates are constantly being made to improve the developer experience. There is an abundance of resources available regarding any dev tool that you may be using. These tools will help you throughout the entirety of your career, so learning them well is a very good investment of your time!
For step 3, making a hypothesis, it is important to eliminate any assumptions that you are making about how the program is operating. Coming up with a hypothesis on the root cause of the bug should involve clear thinking. Random guesses here will not be helpful so be sure to take a step back and use all of the information that you gathered in the previous step to make an informed decision. Really tricky bugs, especially those lacking error messages, might need to be tackled with informed trial and error.
After coming up with your hypothesis, you can move onto step 4, testing your hypothesis: you can narrow in on the part of the codebase that you suspect contains the bug. You can then add debugger statements before the suspected areas of code to step through how variables and functions are operating, comment or modify specific sections of the code, or create unit or integration tests that account for the buggy behavior and re-run your test suite.
Along with step 2 (collecting information), I believe that step 5, analyzing whether your test is working, is next in order of importance. After you have modified sections of the codebase, you should examine your changes. A careful analysis can help you determine if this bug is isolated to this section of the codebase, or if the buggy behavior might be present in similar areas in other parts of the codebase. If the buggy behavior is still present despite your changes, you need to cycle back up to step 1 and repeat this process.
After performing these steps on a variety of bugs, you will automatically start to cycle through these steps when you encounter any new issues. This can help with bugs that are reported, but it can also help with catching bugs before they are ever committed into the codebase.
The great thing about this approach is that it can also be used while pair programming. Having multiple sets of eyes on a problem and cycling through these steps can help you find the issue more quickly. I have found that the “time to ask to pair program” threshold is different for everyone, but it is important to leverage your teammates and their knowledge. You can ask yourself who has last worked on this feature, or who knows a lot about this feature and may be able to provide more insight?
Additional Tips & Tricks
In addition to the approach that I like to use, I also like to keep the following things in mind:
- Check easy and fast stuff first, even if it seems unlikely. For example, are my servers running, am I checking the correct file, does the TypeScript server need a quick restart? Did I save my changes?
- Depending on the type of bug, do I have the relevant screens up on my monitors? Are Chrome Dev Tools open, are my terminals open, are any back-end logs open, are the relevant dev tools open (e.g., React dev tools, Redux dev tools)?
- Is this a rendering issue (React/CSS) or a logic issue (JS/TS)? Both?
- Is this a front-end issue or a back end-issue? Both?
- How long has this bug been present? I often like to use git bisect to detect which commit introduced the bug.
- Make sure to read/skim the entire error message. However, logs often contain lots of noise.
- Error messages are clues but not gospel, and they can be misleading. For example, error messaging might be incorrect, it may not have been updated properly by a previous developer, or the error could be coming from further upstream. Error messages can be red herrings, and it is important to treat them as such.
- Have other people experienced this issue? Check Stack Overflow and GitHub issues using key search terms.
- Are there any special notes in any of the relevant documentation that may have been overlooked?
- Use the “fold”/“unfold” features of your editor to minimize the amount of code noise that your brain must process while you are skimming through files.
- Taking a walk — sometimes stepping away and coming back with a fresh set of eyes leads to new insights. In the same vein, those “eureka!” moments could come when you are away from your machine.
- Make use of “rubber duck” debugging — explain the problem to an inanimate object (or a willing coworker) step-by-step. As you present the information, you might spot the error or places where your thinking could be reevaluated.
- Have you tried asking ChatGPT? Although still in its nascent stages, it can serve as a quick primer for certain questions. (Disclaimer: make sure to verify its results are correct and accurate.)
- Did you try turning it off and on again? In the extremely rare event that you are experiencing non-deterministic behavior, it might not be you but the universe: The Universe is Hostile to Computers.
Crash Bandicoot: A Case Study
In this post regarding a memory card error in Crash Bandicoot 1, the author describes how they narrowed down the problem:
About the only thing you can do when you run out of ideas debugging is divide and conquer: keep removing more and more of the errant program’s code until you’re left with something relatively small that still exhibits the problem. You keep carving parts away until the only stuff left is where the bug is.
I returned repeatedly to the test program, trying to detect some pattern to the errors that occurred when the timer was set to 1kHz. Eventually, I noticed that the errors happened when someone was playing with the PS1 controller. Since I would rarely do this myself — why would I play with the controller when testing the load/save code? — I hadn’t noticed it. But one day one of the artists was waiting for me to finish testing — I’m sure I was cursing at the time — and he was nervously fiddling with the controller. It failed. “Wait, what? Hey, do that again!”
In conclusion, if you take anything away from this post, let it be this: debugging is a skill that can be improved. Through the use of a methodical and structured approach, you can improve how quickly you detect and address bugs. Doing so will lead to a more pleasant development experience and more robust applications for your users.