Vertical Scaling vs Horizontal Scaling: When to Use Which

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Default Answer Is Wrong

Ask most engineers which scaling strategy is correct and they will say horizontal scaling. It is the modern answer. It is what cloud-native architecture evangelists advocate. It is what Kubernetes is built around. The answer is also frequently wrong for the specific situation at hand.

Horizontal scaling means adding more instances. Vertical scaling means adding more resources to existing instances — more CPU, more RAM, faster storage. Both have appropriate use cases. The choice depends on your workload, your database, and your operational constraints — not on what the industry considers architecturally fashionable.

When Vertical Scaling Is the Right Answer

Vertical scaling is the right answer when your workload cannot be parallelized, or when parallelization cost exceeds the cost of a larger instance.

Databases are the clearest case. A primary PostgreSQL or MySQL write instance does not benefit from horizontal scaling in the way application servers do. Adding more write replicas does not increase write throughput — all writes go to the primary. If your primary is saturating CPU or I/O, your options are: move to a larger instance (vertical), reduce write load through application-level batching, or shard the data (which is horizontal but at the data level, not the server level). For most systems below extreme write volume, vertical scaling the primary is faster, cheaper, and operationally simpler than sharding.

In-memory computation workloads also favor vertical scaling. If your process needs to hold a large dataset in memory — graph traversal, ML inference, complex session aggregation — adding more instances gives you more parallelism but each instance still needs the memory to hold its working set. A larger instance with more RAM solves the problem directly. Multiple smaller instances require the working set to be partitioned, which may or may not be feasible.

# PostgreSQL primary write throughput vs instance size
# Benchmark: pgbench, scale factor 100, 32 concurrent clients, 60s

db.t3.medium  (2 vCPU, 4 GB RAM):   ~1,800 TPS
db.t3.large   (2 vCPU, 8 GB RAM):   ~2,200 TPS  (working set fits in RAM)
db.m5.xlarge  (4 vCPU, 16 GB RAM):  ~4,100 TPS  (more cores + RAM)
db.m5.4xlarge (16 vCPU, 64 GB RAM): ~9,800 TPS

# Source: RDS PostgreSQL, gp3 storage, benchmark results will vary
# significantly based on workload type and query complexity.

When Horizontal Scaling Is the Right Answer

Horizontal scaling is the right answer when your workload is stateless and the bottleneck is parallel throughput, not per-instance performance.

HTTP application servers are the canonical case. A stateless API server — one that holds no session state, reads from a shared database, and can handle any request without knowledge of other requests — scales linearly with instances behind a load balancer. Doubling instances roughly doubles throughput up to the point where the database or a downstream service becomes the bottleneck.

Queue consumers are another clear case. If you have a queue of work to process and processing each item is independent, adding consumer instances increases throughput proportionally. This is embarrassingly parallel. It is the textbook use case for horizontal scaling.

The requirement is statelessness. If your application server holds local state — in-memory session data, local file handles, instance-specific caches — horizontal scaling requires either eliminating that state or managing it across instances. Sticky sessions (routing the same user to the same instance) work but create uneven load distribution and complicate failover. The better answer is to move the state out: sessions to Redis, files to object storage.

The Hybrid Reality

Most production systems use both. A common pattern:

  • Application tier: horizontally scaled stateless instances
  • Cache tier: vertically scaled Redis primary with read replicas for redundancy
  • Database tier: vertically scaled primary, horizontally scaled read replicas

The read replicas are horizontal, but the primary is vertical. This is not inconsistent — read replicas are stateless from the write path's perspective, so they scale horizontally. The primary has stateful write serialization requirements, so it scales vertically until vertical limits become the bottleneck.

The Decision Criterion

Before choosing, answer two questions:

  1. Is the unit of work parallelizable without shared mutable state?
  2. Does the cost of coordination between parallel units exceed the cost of a larger instance?

If the work is parallelizable and coordination cost is low: horizontal. If the work requires shared state or coordination cost is high: vertical. If the bottleneck is a database primary: vertical first, then evaluate sharding only when vertical limits are hit.

The rule that horizontal scaling is always preferable is a piece of cloud-era marketing that got absorbed into engineering orthodoxy. Choose based on the workload, not the orthodoxy.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Seattle Backend Salaries Hit $175K Because Big Tech Sets the Floor — Here Is How Startups Compete

You wrote a job post with a salary range you thought was competitive. Then you watched every qualified applicant ghost you after the first screen.

Read more

How to Run a Client Meeting That Does Not Waste Everyone's Time

Most meetings fail before they start because no one was clear about what needed to be decided. Structure is the difference between a meeting that moves things forward and one that produces another meeting.

Read more

Java Generics Beyond `List<T>` — Wildcards, Bounds, and When They Actually Matter

Most Java developers use generics as glorified type-safe containers and stop there. Wildcards and bounds solve real API design problems — here is what they are, when they help, and when they make things worse.

Read more

Lisbon Is No Longer the Affordable Tech Hub It Used to Be — Here Is What Startups Do Now

Lisbon built its reputation as a place where startups could hire well without spending like San Francisco. That window has mostly closed.

Read more