Stateless vs Stateful: The Decision That Affects Everything Downstream

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Decision That Gets Made Implicitly

Most teams don't consciously choose between stateless and stateful service design. They make dozens of small decisions — store user session in memory, cache the user profile object on first load, accumulate request metrics in a local counter — and the result is a stateful service that nobody explicitly chose to build. When they try to scale it, the statefulness fights them.

Understanding this distinction before you build is worth the ten minutes it takes.

What Statefulness Actually Means

A stateless service instance treats every request as independent. It does not store information about prior requests. Any instance in a pool can handle any request correctly. Adding instances increases capacity linearly.

A stateful service instance has memory of prior requests, clients, or connections. The correct handling of a request may depend on what happened to this specific instance before. Not all instances are interchangeable.

The distinction is about the instance, not the system. A stateless service can absolutely persist data — it writes to and reads from a database, a cache, or object storage. The key property is that the state lives outside the instance. The instance itself is ephemeral and interchangeable.

Why Statelessness Is the Default for HTTP Services

Stateless HTTP services have a simple scaling story: put a load balancer in front of N identical instances. When load increases, add instances. When load decreases, remove them. Any instance handles any request. This is horizontal scaling in its purest form.

Stateful HTTP services complicate this immediately:

  • Load balancer configuration: You need sticky sessions to ensure a client always reaches the same instance. This unevenly distributes load and creates a single point of failure per client.
  • Instance replacement: When you deploy a new version, rolling deploys drop old instances. In-memory state in those instances is lost. Sessions break. Operations in progress may fail.
  • Debugging: Two instances of the same service may respond differently to the same request if their in-memory state differs. Reproducing production issues requires knowing which instance the request hit.

None of these are unsolvable. They're costs. The question is whether the benefit of in-instance state is worth those costs.

Where Statefulness Is Justified

Some workloads genuinely require stateful instances:

WebSocket connections: A persistent connection between a client and a server is inherently stateful. The connection lives on one instance. If that instance goes down, the connection is lost. This is managed with reconnection logic on the client side and a message broker for cross-instance fan-out (when you need to broadcast to all clients, regardless of which instance holds the connection).

Stateful protocol implementations: Some protocols, like FTP data connections or certain streaming protocols, maintain connection-level state that can't be externalized cheaply.

High-performance computing where the cost of serializing and deserializing state to an external store on every operation would dominate the processing time. An ML inference server that loads a 2GB model into memory on startup is stateful — and appropriately so.

Gaming servers, real-time collaboration where multiple clients share a live session that requires microsecond-latency coordination. The state needs to be local to the coordinator.

The External State Model

When you externalize state from your instances to a shared store, you need to think carefully about:

Consistency: Redis in single-node mode is eventually consistent under failure. Redis Cluster provides horizontal scalability with a partitioned key space but has specific constraints on multi-key operations. For session data, eventual consistency is usually acceptable. For financial transactions, it is not.

Latency: An external cache round-trip adds ~1ms in a co-located deployment. Under load, with connection pool contention, this can spike significantly. If you're making ten Redis calls per request, that's 10ms of added latency at minimum.

Failure modes: When your external state store is unavailable, your instances may be unable to serve requests that require state access. Design the fallback: what does your service do when Redis is down? Fail closed (return errors), fail open (allow requests with reduced functionality), or use local memory as a fallback with staleness tolerance?

The Architecture Decision Record You Should Write

The stateless vs stateful choice deserves documentation. Write down:

  1. What state, if any, will live in this service's instances?
  2. How is that state invalidated or updated when it becomes stale?
  3. What happens to operations in flight when an instance is replaced?
  4. How does this design behave when the instance is scaled to zero and restarted?

If you can answer all four cleanly, your design is considered. If question three or four produces "the operation is lost," that's a design decision with real user impact — it should be explicit, not implicit.

The Practical Takeaway

For your next service, default to stateless: no in-memory session, no local cache that isn't backed by an external store, no accumulated state that differs between instances. If you have a requirement that seems to demand statefulness, write it down and verify it can't be met with externalized state before accepting the operational complexity of a stateful design.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Performance Testing Is Not Something You Do Right Before Launch

Running your first load test the week before launch is risk management theater. By that point, performance problems are architectural — and architectural problems cannot be fixed in a week. Performance testing belongs earlier in the development cycle than most teams put it.

Read more

When a Software Project Goes Wrong: A Contractor’s Perspective

“It was supposed to be done last month… what happened?” From the outside, it looks like failure. From the inside, it’s usually more complicated.

Read more

Scope Creep Is Not the Client's Fault. It Is a Communication Problem.

Scope creep does not happen because clients are difficult. It happens because the original scope was never clearly enough defined — and that is usually the contractor's responsibility.

Read more

Your Docker Setup Is Not as Secure as You Think

Running containers feels isolated and therefore safe. It isn't. Most default Docker configurations have exploitable weaknesses: root processes, excessive capabilities, exposed sockets, and no resource limits. Locking them down is straightforward but rarely done.

Read more