Event-Driven Architecture: The Service Communication Style Worth Understanding

by Eric Hanson, Backend Developer at Clean Systems Consulting

What event-driven architecture actually solves

You have a checkout flow where confirming an order triggers inventory reservation, payment processing, email confirmation, and analytics logging. With synchronous REST calls, you've chained these sequentially, which means your checkout latency is the sum of all of them and a failure in email confirmation rolls back the entire checkout. This is the problem event-driven architecture was designed to solve.

Event-driven architecture (EDA) means services communicate by publishing and subscribing to events rather than by calling each other directly. When an order is confirmed, the Order Service publishes an OrderConfirmed event to a durable message broker — typically Apache Kafka, RabbitMQ with quorum queues, or AWS EventBridge. Every downstream service that needs to react subscribes independently. The Order Service never calls them. They process at their own pace, independently, and if they're down when the event is published, they catch up when they recover.

The core benefit is temporal decoupling: publisher and subscriber don't need to be available simultaneously. This is the property that eliminates the cascading failure problem inherent in synchronous service chains.

The Kafka model in practice

Kafka is the dominant choice for internal service events because of its durability guarantees and replay semantics. Topics are partitioned logs retained for a configurable period (often 7–30 days). Consumer groups track their offset, and if a consumer falls behind or restarts, it reads from where it left off.

// Producer: Order Service publishes after successful DB write
@Service
public class OrderService {
    public Order confirmOrder(OrderRequest request) {
        Order order = orderRepository.save(Order.from(request));
        
        kafkaTemplate.send("orders.confirmed", 
            order.getId().toString(),  // partition key: routes same order to same partition
            OrderConfirmedEvent.builder()
                .orderId(order.getId())
                .userId(order.getUserId())
                .items(order.getItems())
                .total(order.getTotal())
                .confirmedAt(Instant.now())
                .build());
        
        return order;
    }
}
// Consumer: Inventory Service reacts independently
@KafkaListener(topics = "orders.confirmed", groupId = "inventory-service")
public void handleOrderConfirmed(OrderConfirmedEvent event) {
    inventoryReservationService.reserve(event.getOrderId(), event.getItems());
}

The Inventory Service processes events in its own consumer group. If you add a new downstream service (say, a fraud detection service), you create a new consumer group subscribed to the same topic. The Order Service is unchanged.

The consistency problem you cannot ignore

The trade-off is consistency. With synchronous calls, you know immediately if inventory reservation failed. With events, you don't. The Order Service has published the confirmation event and returned a success to the user — but Inventory Service processing might fail ten seconds later.

This is eventual consistency, and it requires designing explicitly for compensating transactions. If Inventory Service can't reserve stock for an order, it publishes an InventoryReservationFailed event. Order Service (or a saga orchestrator) consumes that event and initiates a compensating action: cancel the order, notify the user, refund the payment.

The saga pattern formalizes this:

OrderConfirmed
  → InventoryService: reserve stock
    → success: StockReserved
      → PaymentService: charge card
        → success: PaymentCollected → order fulfilled
        → failure: PaymentFailed → compensate: release inventory
    → failure: InventoryFailed → compensate: cancel order

Each step publishes either a success or failure event. Compensating transactions undo the effects of earlier steps. This is conceptually clean but operationally demanding — you need to handle partial failures, duplicate events, and out-of-order processing.

Duplicate events: design for idempotency

In any at-least-once delivery system (Kafka with enable.auto.commit=false and consumer-side offset commits), events can be delivered more than once if a consumer crashes after processing but before committing the offset. Every consumer must be idempotent: processing the same event twice must produce the same result as processing it once.

@KafkaListener(topics = "orders.confirmed", groupId = "inventory-service")
@Transactional
public void handleOrderConfirmed(OrderConfirmedEvent event) {
    // Idempotency check: skip if already processed
    if (processedEventRepository.existsByEventId(event.getEventId())) {
        return;
    }
    
    inventoryReservationService.reserve(event.getOrderId(), event.getItems());
    processedEventRepository.save(ProcessedEvent.of(event.getEventId()));
}

The processed_events table (with the event ID as a unique key) prevents double-processing. The transactional boundary ensures the inventory write and the idempotency record are committed atomically.

The outbox pattern: avoiding dual-write failures

A common failure mode: the service writes to its database and then publishes to Kafka. If the process dies between those two operations, the event is lost. The DB state changed but no downstream service knows.

The outbox pattern solves this by writing the event to an outbox table in the same DB transaction as the domain write:

BEGIN;
  INSERT INTO orders (...) VALUES (...);
  INSERT INTO outbox (event_id, event_type, payload, created_at)
    VALUES (gen_random_uuid(), 'order.confirmed', '...', NOW());
COMMIT;

A separate relay process (Debezium via CDC, or a polling loop) reads from outbox and publishes to Kafka, then marks events as published. The relay can retry safely — Kafka consumers are idempotent anyway.

What you're trading

EDA trades synchronous complexity (cascading failures, latency amplification) for asynchronous complexity (eventual consistency, idempotency requirements, saga design, consumer lag monitoring). Neither is free.

If you adopt EDA, invest immediately in consumer lag monitoring (Kafka's consumer lag metric via JMX or Prometheus), dead letter queues (DLQs) with alerting for failed events, and distributed tracing with correlation IDs propagated through events. Without those, debugging a failed saga becomes archaeology.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

Broken Object-Level Authorization in Spring Boot — How to Detect and Prevent IDOR

IDOR (Insecure Direct Object Reference) is consistently the most common API vulnerability. It occurs when an API endpoint accepts a resource identifier and returns or modifies the resource without verifying the caller has permission to access that specific resource.

Read more

Oslo Backend Engineers Cost NOK 850K+ Per Year — Here Is What Startups Do Instead

You posted a senior backend role three months ago. The only candidates within budget were junior. The ones with experience wanted NOK 900K and a signing bonus.

Read more

Stop Skipping Integration Tests in Spring Boot

Unit tests give you confidence your classes work in isolation. Integration tests tell you whether your application actually works. Most Spring Boot projects have too few of the latter — and pay for it in production.

Read more

Multi-Stage Builds: The Dockerfile Trick That Shrinks Your Image

Multi-stage builds let you use a full build environment to compile your application and then copy only the result into a minimal runtime image — eliminating build tools, source code, and intermediate artifacts from what you actually ship.

Read more