Good System Design Starts With Understanding the Problem Not the Solution

by Arif Ikhsanudin, Backend Developer

You Are Already Thinking About Microservices

The feature request lands in your backlog: "Build a notification system." Within an hour, someone has proposed Kafka, a fanout service, per-channel workers, and a preference engine. The whiteboard is full. Everyone is excited. Nobody has asked what a notification actually is in this product, who receives it, how fast it needs to arrive, or what happens if it is late.

This is the default failure mode of experienced engineers. Pattern recognition kicks in before problem understanding does. You have built notification systems before. You know what the solution looks like. So you skip the part where you figure out whether this is actually the same problem.

It usually is not.

The Questions That Change the Design

The notification system problem has at least four meaningfully different shapes:

  1. Transactional alerts (password reset, payment confirmation) — delivery within seconds, high reliability, low volume
  2. Marketing campaigns — bulk delivery, timing flexibility, opt-out compliance required, high volume
  3. Real-time collaboration events (someone edited your document) — sub-second delivery, best-effort acceptable, very high volume
  4. Regulatory notifications (account suspended, legal hold) — guaranteed delivery, audit trail required, low volume

Each of these shapes calls for a different architecture. Shape 1 is well-served by a transactional email provider like SendGrid with synchronous HTTP calls from your application. Shape 3 might be WebSockets with a Redis pub/sub backbone. Shape 4 needs durable storage with delivery confirmation and immutable audit logs.

If you jump to "Kafka fanout service" before asking which shape you have, you will probably build something that handles shape 2 well and shapes 1 and 4 poorly — and you will not find out until you are debugging a missing password reset email in production.

How to Actually Understand the Problem

Three questions that force clarity before any architecture is proposed:

What does failure look like, and how bad is it? A notification that arrives 30 seconds late during a flash sale is a business problem. A notification that arrives 30 seconds late for a two-factor authentication code is a user experience problem. A notification that never arrives confirming a wire transfer is a compliance problem. These are not the same failure severity and should not be treated identically.

What is the expected volume and growth curve? Not "we want to scale to millions of users" — that is a goal, not a constraint. What is the actual current volume, what is the projected volume in 6 months, and is there a predictable spike pattern (end-of-month billing, market open/close)?

Who owns the downstream complexity? Notification systems touch email providers, SMS gateways, push notification services, in-app rendering, and user preference storage. Each of those integrations has rate limits, delivery semantics, and failure modes. Does your design need to own all of that or can it delegate to a managed service that handles provider failover and compliance?

# Problem definition before any architecture:

Notification types:     transactional only (for now)
Volume:                 ~500/day today, projected 5,000/day in 6 months
Latency requirement:    < 5 seconds for password reset, < 30s for receipts
Delivery guarantee:     at-least-once; duplicates handled by idempotency key
Channels:               email only; SMS is Q3
Failure tolerance:      retry up to 3x over 10 minutes, then alert on-call
Compliance:             GDPR unsubscribe required; no marketing content

That specification rules out half the "enterprise notification platform" solutions immediately. It also tells you that a simple queue with a dead-letter channel and an email provider integration is probably sufficient — and adding Kafka now is engineering for a future you have not validated.

The Cost of Skipping This Step

Skipping problem definition does not save time. It relocates the time to the worst possible moment: after the system is built, when changing the architecture means reworking production code under pressure.

The patterns are predictable. Teams that jump to solutions end up with:

  • Overly complex systems with operational overhead that exceeds the problem's actual requirements
  • Systems that handle the imagined use case well and the real use case poorly
  • Architecture decisions made for scale that will never materialize, creating maintenance burden for years

A system that was designed for a problem that was never properly defined is not a technical debt problem — it is a requirements debt problem, and it does not get paid off by refactoring the code.

Start With a Written Problem Statement

Before any architecture discussion, write down the problem in concrete terms. Not user stories — constraints. Volume, latency, consistency requirements, failure tolerance, regulatory surface area, team operational capability. One page maximum.

If you cannot write that page, you are not ready to design the system. If writing it surfaces disagreement about what the system needs to do, you have just saved yourself from building the wrong thing. That disagreement is the design work. Do it before the code, not during.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Remote Backend Contractors Are Replacing SF's Revolving Door of $200K Engineers

You hired a backend engineer in March. They left in November. You hired another in January. They left in September. The door keeps spinning and your codebase keeps paying for it.

Read more

Scanning Your Docker Image for Vulnerabilities Is Not Optional

Your Docker image inherits every vulnerability in its base image and every package you install. Without scanning, you don't know what you're shipping to production — and neither does your security team until an audit or incident reveals it.

Read more

Why Most Software Problems Are Communication Problems

When software goes wrong, it’s rarely the code itself. Most problems start with unclear expectations, misaligned priorities, or missed context.

Read more

How to Keep Clients Happy When Things Go Wrong

Even the best projects hit bumps. How you handle problems can make or break your client relationships.

Read more