The Difference Between Fixing a Bug and Understanding a Bug

by Arif Ikhsanudin, Backend Developer

The Fix That Created the Next Bug

A null pointer exception was appearing intermittently in the order processing service. The stack trace pointed to a line that accessed order.getCustomer().getAddress(). The fix: add a null check before the access.

// Before
String city = order.getCustomer().getAddress().getCity();

// After
String city = order.getCustomer() != null && order.getCustomer().getAddress() != null
    ? order.getCustomer().getAddress().getCity()
    : null;

The NPE stopped appearing. The bug was not fixed — it was silenced. The underlying question — why would an order exist without a customer? — was never asked. Three months later, a data integrity issue was discovered: a batch import job was creating orders without linking them to customers. The silenced NPE had been the only signal. Now there were thousands of orphaned orders in production and no easy way to reconcile them.

This is the difference between fixing a bug and understanding a bug.

Why the Quick Fix Wins

There's significant pressure in most engineering environments to close tickets and move on. A bug fix that resolves the user-visible symptom closes the ticket. A bug investigation that questions a system assumption requires time, may reveal larger problems, and doesn't produce immediate visible output.

The incentive structure points toward the quick fix. The right engineering behavior often points the other way.

The Five-Why Approach to Bugs

Toyota's Five Whys methodology — ask "why" until you reach the root cause — applies directly to software debugging. The discipline is to not stop at the proximate cause.

Using the example above:

  1. Why did the NPE occur? order.getCustomer() returned null.
  2. Why was the customer null? Orders can be created without a customer reference.
  3. Why can orders be created without a customer? The data model doesn't enforce the relationship at the database level — it's nullable.
  4. Why is it nullable? It was left nullable during a batch import feature that imported historical orders without customer data.
  5. Why wasn't this addressed after the import? It was intended to be temporary and the follow-up ticket was never picked up.

The root cause is a data integrity constraint that was intentionally relaxed and never restored. The correct fix is to enforce the constraint in the database, migrate the orphaned records, and add a non-null constraint. The NPE was a symptom of a data model decision.

The Categories of Root Cause

Most bugs trace to one of a few root cause categories:

Missing validation: Input that should have been rejected was accepted and propagated to a state where it caused a failure later. The fix is validation at the entry point, not defensive checks throughout the system.

Violated invariant: The system assumes a property (every order has a customer, every session has a user, every transaction has an amount) that can be violated under certain conditions. The fix is either enforcing the invariant where it should hold, or redesigning the logic that depends on it.

Race condition: Two operations that individually are correct produce incorrect state when they execute concurrently. Null checks don't fix race conditions. Proper synchronization or atomic operations do.

Incorrect assumption about external behavior: The service assumes a third-party API returns in a specific format or within a specific time. The fix is either validating the assumption at the integration boundary or designing for its violation.

Missing edge case: A case that was not tested or considered during implementation. The fix includes both the case handling and the test that would have caught it.

What Understanding a Bug Produces

An investigation that reaches root cause produces:

  • A fix that addresses the cause rather than the symptom
  • A test that would have caught the bug (and will catch regressions)
  • Possibly a design improvement that prevents the class of bugs
  • Documentation of the finding — at minimum, a commit message that explains why the fix was made, not just what it does

This takes longer than a patch. It also avoids the pattern where the same bug appears in slightly different forms repeatedly because the underlying cause was never addressed.

The Practical Takeaway

For your next non-trivial bug fix, before writing any code, write down why the bug occurred at the most fundamental level you can reach with available information. If the answer is "the code was wrong," go one level deeper: why was the code wrong? Missing validation? Wrong assumption? Missing test? Let that answer guide both the fix and the test you write to prevent recurrence. Then check: is there anywhere else in the codebase making the same wrong assumption?

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Distributed Caching With Redis in Spring Boot — Beyond the Basics

Spring Boot's Redis cache integration works with minimal configuration. The decisions that matter — serialization format, key design, eviction policy, and how to handle cache-aside vs read-through patterns — require deliberate choices that affect correctness and performance under load.

Read more

Why Office-Only Policies Don’t Solve Security or Productivity Problems

“We need everyone back in the office for security and productivity.” It sounds responsible—until you look at what actually improves those things.

Read more

Why Building Software Is More Expensive Than Most Founders Expect

Wait… why is this so expensive? It’s just an app.” That moment hits almost every founder at some point.

Read more

The Day Your Deployment Broke Everything

Deployments are supposed to be exciting, not terrifying. But sometimes, one push to production can turn your day upside down.

Read more