Read Replicas: The Database Scaling Strategy Most Teams Discover Too Late

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Wall You Hit

Your application's read traffic has grown steadily. The primary database CPU sits at 70% during peak hours. p95 read latency is climbing. You are close to the ceiling of what the current instance can handle, and you are starting to feel it. The instinct is to size up the instance. The better move — which you could have made three months ago with less urgency — is to add a read replica.

Most web applications are read-heavy. A typical ratio is 80–90% reads, 10–20% writes. A single primary database handles all of it. By adding a read replica — a continuously synchronized copy of the primary that serves read traffic — you shift the read load off the primary, leaving it to handle writes and critical reads with significantly more headroom.

How Replication Works

PostgreSQL and MySQL both implement streaming replication: the primary writes changes to a write-ahead log (WAL in PostgreSQL, binary log in MySQL), and replicas continuously consume that log, applying changes to maintain a synchronized copy.

The replica is eventually consistent with the primary. Under normal conditions, replication lag is sub-second — typically 10–100ms. Under write pressure or network congestion, lag can grow. A replica with 5 seconds of replication lag serves data that is up to 5 seconds old.

This has important implications for which reads can safely route to replicas:

# Safe to route to replica:
- Product catalog reads (price/inventory updates are infrequent)
- User profile reads (doesn't need to be millisecond-current)
- Analytics and reporting queries
- Historical data access
- Admin dashboard reads

# Must route to primary:
- Reads immediately after a write in the same transaction or session
  (read-your-writes consistency)
- Inventory checks during checkout (stale read could oversell)
- Balance checks before financial operations
- Any read where staleness causes a correctness problem

The Mechanics of Routing

The simplest routing is connection-based: configure your ORM or database driver to use the replica connection for reads and the primary connection for writes. Rails has a built-in connected_to with role-based routing. Django supports multiple database configuration with query routers.

More sophisticated routing handles the read-your-writes case: after a write, tag the session with a "prefer primary" flag for a short window (1–2 seconds), long enough for replication lag to clear. Subsequent reads in that window go to the primary; reads after the window route to the replica.

# Simplified read-your-writes routing:
def get_db_connection(user_session, operation_type):
    if operation_type == "write":
        user_session["last_write"] = time.time()
        return primary_conn

    # Route to primary briefly after a write for read-your-writes
    if user_session.get("last_write") and time.time() - user_session["last_write"] < 2:
        return primary_conn

    return replica_conn

Replicas for More Than Read Scaling

Failover. A replica can be promoted to primary in the event of a primary failure. RDS Multi-AZ and Cloud SQL High Availability use a synchronous standby replica (replication with no lag, lower write throughput) for automatic failover with sub-60-second recovery time.

Analytics and reporting. Long-running analytical queries — the kind that might take 30 seconds and scan millions of rows — will devastate the primary during peak hours. Running them against a replica (possibly a dedicated read replica not used for application reads) isolates the analytical workload from the production read path.

Blue/green database migration. Replicas can be used as the target for a migration — run the migration on a replica, verify it, promote it. Reduces the risk window for major schema changes.

When to Add One

Add a read replica before you are forced to. The moment your primary exceeds 60% CPU during normal operating hours, you have lost the headroom to absorb spikes without a problem. A replica is a managed service add-on in most cloud environments — RDS, Cloud SQL, Azure Database all offer one-click replica provisioning. The cost is roughly the instance cost of the replica.

The mistake is waiting until you are at 90% CPU and in an incident to make the decision. At that point, the replica takes time to provision and sync — minutes to hours depending on data volume — during which you are still degraded. Add it proactively.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Contractor or Employee? When Clients Blur the Line

“We treat all contractors like full-time team members here.” It sounds inclusive—until you realize it changes everything about the work.

Read more

Red Flags That Predict Software Project Failure

“It’s probably fine… we just need a bit more time.” That sentence has quietly preceded more failed projects than anyone admits.

Read more

What Mid-Level Developers Get Wrong About System Design

Mid-level developers often bring strong implementation skills into system design conversations and apply the wrong mental model. The gaps are predictable, and fixing them is less about knowledge than about shifting the frame.

Read more

How to Politely Push Back on Unreasonable Demands

Clients can sometimes ask for the impossible—or the unreasonable. Here’s how to set boundaries without burning bridges.

Read more