Breaking Up a Monolith Without Breaking Everything

January 16, 2026

by Arif Ikhsanudin, Backend Developer

Why big-bang migrations fail

Your team has a plan: spend six months extracting five services from the monolith, then cut over. Six months later, three services are "done" but untested under production load, two are still in progress, the monolith has continued accumulating new features throughout, and the cutover date keeps slipping because nobody is confident the new services actually work correctly.

This is the big-bang migration pattern, and it fails consistently. The failure mode is not technical incompetence — it's that you've deferred all the risk to a cutover event that is now six months away and enormous in scope. Big-bang cutovers have no feedback loop: you won't know if the extracted services work correctly under production conditions until you flip the switch, at which point rolling back is painful.

The alternative is incremental extraction with production validation at each step. Small, reversible changes. Continuous production traffic on extracted components before committing to them.

Defining the right first extraction

The first service you extract sets the precedent for how your team works and validates your infrastructure (deployment pipeline, monitoring, service mesh). Choose it carefully.

The ideal first extraction candidate:

Has clear boundaries in the monolith — minimal cross-cutting concerns with other modules
Is not on the critical path for your most important user flows (so a bug doesn't cause a high-severity incident)
Has meaningful traffic volume (so you can validate behavior under real load, not just synthetic tests)
Has a team that is motivated and capable of owning a service

Notification systems, reporting services, and admin tooling are common good first candidates. Payment processing and user authentication are bad first candidates — too high-risk, too many implicit dependencies, too much blast radius if something goes wrong.

The parallel running pattern

Before routing any production traffic to the new service, run it in parallel with the monolith: the monolith handles the request authoritatively, and the new service also handles it, with results compared but not returned to users.

This pattern (called dark launching or shadow testing) gives you production validation without production risk:

// In the monolith: parallel call for comparison
public OrderResult createOrder(OrderRequest request) {
    // Primary path: monolith handles the request
    OrderResult monolithResult = monolithOrderService.create(request);
    
    // Shadow path: new service runs in parallel, result is compared but discarded
    CompletableFuture.runAsync(() -> {
        try {
            OrderResult serviceResult = newOrderService.create(request);
            compareResults(monolithResult, serviceResult, request.getOrderId());
        } catch (Exception e) {
            log.warn("Shadow service error for order {}: {}", request.getOrderId(), e.getMessage());
        }
    });
    
    return monolithResult;
}

Run this for one to two weeks. Monitor the comparison results for discrepancies. Fix bugs in the new service. When the discrepancy rate drops to near zero and you're confident in the new service's behavior, you're ready to route actual traffic.

Incremental traffic migration

Don't cut over 100% of traffic to the new service. Use feature flags or weighted routing to migrate gradually:

1% of traffic to new service, 99% to monolith → validate for 24 hours
10% → validate for 48 hours
50% → validate for one week
100% → keep monolith code paths available for rollback

Weighted routing at the API gateway (Kong, NGINX, or your service mesh) makes this infrastructure-level rather than application-level:

# Kong: weighted routing between monolith and new service
services:
- name: order-service-routing
  plugins:
  - name: traffic-split
    config:
      destinations:
      - service: monolith-order
        weight: 90
      - service: new-order-service
        weight: 10

Monitor error rates and latency for both targets during the migration. If the new service's error rate exceeds the monolith's, roll back the weight. The rollback should take thirty seconds, not a deployment pipeline run.

Data migration strategy

The hardest part of any service extraction is not the code — it's the data. The new service needs its own database. Getting data out of the monolith's database and into the new service's database, consistently and without downtime, requires a strategy.

Dual write: during migration, write to both the monolith database and the new service's database. Read from the monolith. When you're ready to cut over reads to the new service, the data is already there and in sync.

Event-based sync: the monolith publishes change events to Kafka. The new service consumes those events and maintains its own data store. This works well when the monolith already has event publishing or can be instrumented to add it.

Database-level replication with transformation: for initial data load, use a one-time migration script to copy and transform existing data into the new service's schema. For ongoing sync during the migration period, use Debezium (change data capture from the monolith's database) to replicate new changes.

Whichever strategy you use: validate data consistency between both stores throughout the migration period. A comparison job that runs hourly and logs discrepancies will catch problems before they reach users.

Keeping the monolith clean during extraction

While you're extracting services, the monolith continues receiving new features. Without discipline, new code in the monolith creates new dependencies that make future extraction harder.

Enforce the module boundaries being extracted: if you're extracting the Order module, add ArchUnit rules that prevent other monolith code from newly importing from the Order module. New features that would have gone into Order go into the new service instead. The monolith's Order code becomes read-only — no new development — while migration is in progress.

This requires team coordination but prevents the migration from being a treadmill where each extraction is followed by new entanglement.

Our offices

Follow us

Breaking Up a Monolith Without Breaking Everything

Why big-bang migrations fail

Defining the right first extraction

The parallel running pattern

Incremental traffic migration

Data migration strategy

Keeping the monolith clean during extraction

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

The Essential Tools We Use to Work Remotely

Hollywood, Gaming, and Startups All Want the Same LA Backend Developers

Layer Caching in Docker Is a Big Deal and Most Devs Ignore It

Retry Logic Sounds Simple Until It Makes Things Worse