Vertical Scaling vs Horizontal Scaling: When to Use Which
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Default Answer Is Wrong
Ask most engineers which scaling strategy is correct and they will say horizontal scaling. It is the modern answer. It is what cloud-native architecture evangelists advocate. It is what Kubernetes is built around. The answer is also frequently wrong for the specific situation at hand.
Horizontal scaling means adding more instances. Vertical scaling means adding more resources to existing instances — more CPU, more RAM, faster storage. Both have appropriate use cases. The choice depends on your workload, your database, and your operational constraints — not on what the industry considers architecturally fashionable.
When Vertical Scaling Is the Right Answer
Vertical scaling is the right answer when your workload cannot be parallelized, or when parallelization cost exceeds the cost of a larger instance.
Databases are the clearest case. A primary PostgreSQL or MySQL write instance does not benefit from horizontal scaling in the way application servers do. Adding more write replicas does not increase write throughput — all writes go to the primary. If your primary is saturating CPU or I/O, your options are: move to a larger instance (vertical), reduce write load through application-level batching, or shard the data (which is horizontal but at the data level, not the server level). For most systems below extreme write volume, vertical scaling the primary is faster, cheaper, and operationally simpler than sharding.
In-memory computation workloads also favor vertical scaling. If your process needs to hold a large dataset in memory — graph traversal, ML inference, complex session aggregation — adding more instances gives you more parallelism but each instance still needs the memory to hold its working set. A larger instance with more RAM solves the problem directly. Multiple smaller instances require the working set to be partitioned, which may or may not be feasible.
# PostgreSQL primary write throughput vs instance size
# Benchmark: pgbench, scale factor 100, 32 concurrent clients, 60s
db.t3.medium (2 vCPU, 4 GB RAM): ~1,800 TPS
db.t3.large (2 vCPU, 8 GB RAM): ~2,200 TPS (working set fits in RAM)
db.m5.xlarge (4 vCPU, 16 GB RAM): ~4,100 TPS (more cores + RAM)
db.m5.4xlarge (16 vCPU, 64 GB RAM): ~9,800 TPS
# Source: RDS PostgreSQL, gp3 storage, benchmark results will vary
# significantly based on workload type and query complexity.
When Horizontal Scaling Is the Right Answer
Horizontal scaling is the right answer when your workload is stateless and the bottleneck is parallel throughput, not per-instance performance.
HTTP application servers are the canonical case. A stateless API server — one that holds no session state, reads from a shared database, and can handle any request without knowledge of other requests — scales linearly with instances behind a load balancer. Doubling instances roughly doubles throughput up to the point where the database or a downstream service becomes the bottleneck.
Queue consumers are another clear case. If you have a queue of work to process and processing each item is independent, adding consumer instances increases throughput proportionally. This is embarrassingly parallel. It is the textbook use case for horizontal scaling.
The requirement is statelessness. If your application server holds local state — in-memory session data, local file handles, instance-specific caches — horizontal scaling requires either eliminating that state or managing it across instances. Sticky sessions (routing the same user to the same instance) work but create uneven load distribution and complicate failover. The better answer is to move the state out: sessions to Redis, files to object storage.
The Hybrid Reality
Most production systems use both. A common pattern:
- Application tier: horizontally scaled stateless instances
- Cache tier: vertically scaled Redis primary with read replicas for redundancy
- Database tier: vertically scaled primary, horizontally scaled read replicas
The read replicas are horizontal, but the primary is vertical. This is not inconsistent — read replicas are stateless from the write path's perspective, so they scale horizontally. The primary has stateful write serialization requirements, so it scales vertically until vertical limits become the bottleneck.
The Decision Criterion
Before choosing, answer two questions:
- Is the unit of work parallelizable without shared mutable state?
- Does the cost of coordination between parallel units exceed the cost of a larger instance?
If the work is parallelizable and coordination cost is low: horizontal. If the work requires shared state or coordination cost is high: vertical. If the bottleneck is a database primary: vertical first, then evaluate sharding only when vertical limits are hit.
The rule that horizontal scaling is always preferable is a piece of cloud-era marketing that got absorbed into engineering orthodoxy. Choose based on the workload, not the orthodoxy.