Sharding Your Database Sounds Exciting Until You Actually Have to Do It
by Eric Hanson, Backend Developer at Clean Systems Consulting
What Sharding Actually Is
Sharding means horizontally partitioning your data across multiple database instances. Instead of one database holding all your data, you have N databases (shards) each holding 1/N of the data. Queries that access data on one shard execute locally. Queries that need data across shards require cross-shard coordination.
The performance benefit is real: write throughput scales with shard count, each shard's dataset is smaller (better index performance, less I/O), and hardware costs can be distributed. The operational cost is also real, and it is large.
Most teams that say they "need sharding" actually need read replicas, better query optimization, or a larger instance. Sharding is the right answer for a small class of problems. It is usually the wrong first move.
What You Give Up
Cross-shard queries become expensive or impossible. If your data is sharded by user ID, a query that aggregates data across users requires querying every shard and merging results in the application. A report that took one SQL query now requires N queries plus merge logic. If the merge is complex — sorting, grouping, deduplication — the application code bears that burden.
Transactions across shards require distributed coordination. ACID transactions within a single shard work normally. Transactions that span shards — moving a record from shard A to shard B, or updating records on two shards atomically — require two-phase commit (2PC) or saga patterns. 2PC is slow and failure-prone. Sagas are complex to implement correctly. Most applications simply avoid cross-shard transactions, which constrains the data model.
Joins across shards are gone. SQL joins work within a shard. Cross-shard joins require fetching data from multiple shards and joining in the application layer, which is typically slower and harder to optimize than a database join.
Rebalancing is painful. When you add a new shard, you need to redistribute some data to the new shard. This requires a data migration while the system is live — reading records from existing shards, writing them to the new shard, updating routing, deleting the migrated records. Done wrong, this is a data loss event.
# What "simple" sharding adds to your operational footprint:
- Shard routing logic in the application or a middleware layer
- Shard key design (wrong key = hot shards, uneven distribution)
- Cross-shard query handling (application-level aggregation)
- Distributed transaction handling for any cross-shard writes
- Rebalancing procedures when adding shards
- Per-shard monitoring and alerting
- Schema migrations across all shards simultaneously
- Backup and restore per shard
When It Is Actually Warranted
Sharding is warranted when you have exhausted these options in order:
- Query optimization and indexing
- Connection pooling (PgBouncer)
- Read replicas for read offload
- Vertical scaling the primary
- Caching for hot-path reads
- Application-level data archiving (moving old data to cold storage, reducing working set)
If you have done all of these and write throughput or primary storage is still the bottleneck, sharding may be necessary. The typical threshold where sharding becomes worth the cost is write volumes in the millions per second sustained, or datasets in the tens of terabytes where the working set no longer fits in memory.
Alternatives That Are Often Sufficient
Partitioning within a single database. PostgreSQL's table partitioning (declarative partitioning by range or list) splits a large table into physical partitions within the same database instance. Queries that filter on the partition key only scan the relevant partition. This gives some of the query performance benefit of sharding with none of the cross-shard coordination complexity.
Functional decomposition. Split different features into different databases before sharding any single database. An analytics database separate from the transactional database, each sized and scaled for its workload, is often more effective than sharding the combined database.
Managed distributed databases. CockroachDB and Google Spanner handle sharding and replication internally. You use them like a single PostgreSQL-compatible database; they handle shard management. You still pay in consistency trade-offs (CockroachDB uses serializable isolation with some distributed transaction latency; Spanner is expensive) but you offload the operational complexity.
Try everything else first. Sharding is a last resort, not a scaling strategy.