System Design Is Not About Drawing Pretty Diagrams

by Arif Ikhsanudin, Backend Developer

The Diagram Is a Lie

You have been in that meeting. Someone opens a whiteboard tool, draws boxes connected by arrows, labels them "API Gateway," "Service A," "Service B," "Database," and calls it a design. Everyone nods. The diagram looks clean. Nothing in it tells you what happens when Service B goes down, what the write throughput ceiling is, or whether you have chosen the right consistency model for the problem you are actually solving.

System design is a series of decisions under uncertainty. The diagram is just a way to communicate some of those decisions — after you have made them. Treating diagram production as the design process itself is how teams end up with architectures that are aesthetically coherent and operationally broken.

What Design Actually Is

A system design is a set of answers to hard questions. Before any box gets drawn:

  • What is the read-to-write ratio, and does it change under load?
  • What is the acceptable data loss window — seconds, minutes, zero?
  • What is the latency requirement at p99, not just average?
  • Which failure modes are tolerable and which are not?
  • What does the team have operational experience running?

None of these show up in the diagram. A box labeled "Cache Layer" does not tell you whether you chose Redis with read replicas, a local in-process cache, or a CDN edge cache — and those three choices have completely different operational characteristics, invalidation behaviors, and failure modes.

The decisions are the design. The diagram annotates the decisions for people who weren't in the room.

Where Real Design Happens

Real design happens when you sit with constraints. Consider a system handling financial transactions with these requirements: 10,000 writes per second, strict ordering per account, five-nines availability, auditable history.

A diagram might show: Client → API → Queue → Worker → DB. That diagram is compatible with dozens of different implementations. The design decisions narrow it:

  • The queue must be Kafka with a partition key on account ID to guarantee per-account ordering without a global bottleneck
  • The database needs to support serializable isolation or you use optimistic locking with retry logic in the worker
  • The worker must be idempotent because Kafka delivery guarantees at-least-once, not exactly-once
  • The audit log is an append-only event store, not a mutable record with updated_at timestamps
# Kafka topic config for per-account ordering
num.partitions=128
# partition key = account_id ensures ordering within an account
# across 128 partitions for parallelism

# Worker idempotency check
INSERT INTO transactions (id, account_id, amount, created_at)
VALUES ($1, $2, $3, $4)
ON CONFLICT (id) DO NOTHING;

None of that is visible in the pretty diagram. All of it matters for whether the system works.

The Diagram Test

Here is a useful heuristic: if you can swap out the label on any box in your diagram without changing anything else in the diagram, that box is not designed — it is wished for.

"Cache" is not a design decision. "Redis 7 with a 60-second TTL on product catalog reads, write-through on mutations, no caching on user-specific data" is a design decision. The former fits in a box. The latter fits in a document that explains the tradeoff you made and why you made it.

The same applies to every component. "Message Queue" is not a decision. The choice between SQS with FIFO queues versus Kafka versus RabbitMQ with quorum queues involves latency characteristics, durability guarantees, consumer group semantics, and ordering behavior — all of which depend on your specific workload.

What Good Design Documentation Looks Like

The most useful design artifact is not the diagram. It is the Architecture Decision Record (ADR) — a short document capturing the context, the decision, the alternatives considered, and the consequences including the downsides.

# ADR-012: Use Kafka for order event streaming

## Context
Order events must be processed by three downstream consumers
(inventory, billing, analytics) with different throughput requirements.
We need replay capability for backfill and at-least-once delivery.

## Decision
Kafka with consumer groups per downstream service.

## Alternatives Considered
- SQS fan-out via SNS: no replay capability, higher cost at volume
- RabbitMQ: adequate for current load but no built-in replay

## Consequences
- Operational complexity: we need to manage broker, ZooKeeper/KRaft
- Consumers must be idempotent — Kafka does not guarantee exactly-once
  without transactions enabled (adds latency we cannot absorb)
- Retention period set to 7 days; backfill beyond that requires S3 offload

That document is worth more than any diagram. It tells the next engineer why the system is the way it is and what you gave up to get there.

The Practical Shift

Stop leading design sessions by asking "what does the architecture look like." Start by asking "what are the hardest constraints we are working within." Write those down first. Let the diagram emerge from the answers, not the other way around.

Draw the diagram last. Write the decisions first. That is the sequence that produces systems that survive contact with production.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Why Finnish Startups Hire Async Backend Contractors to Scale Beyond Helsinki's Small Talent Pool

Helsinki's engineering community is strong but small. The startups growing fastest have built a way to get backend work done that doesn't depend on the local pool being bigger than it is.

Read more

Why an Ideal Engineering Team Needs More Than Just Full-Stack Developers

Hiring a few “full-stack developers” sounds like the efficient choice. But relying on them alone often creates hidden gaps that slow everything down.

Read more

Why Cheap Contractors End Up Costing Clients More

The lowest rate is rarely the lowest cost. Clients who have learned this the hard way spend more carefully the next time.

Read more

Optimistic Locking in Hibernate — @Version, Retry Strategies, and Conflict Resolution

Concurrent updates to the same entity without coordination produce lost updates — the last write wins and intermediate changes are silently discarded. Optimistic locking detects this at commit time. Here is how it works and how to handle the conflicts it surfaces.

Read more