Logging Across Microservices Is Useless If You Can't Connect the Dots

by Eric Hanson, Backend Developer at Clean Systems Consulting

Why individually good logs fail you at the system level

Each of your services logs well. Order Service uses structured JSON, logs request IDs, logs errors with stack traces. Inventory Service does the same. Payment Service too. Then a checkout fails at 14:47:23, and you search Order Service logs for that timestamp. You find an error. You want to know what Inventory Service was doing when Order Service got that error. You search Inventory Service logs for 14:47:23. You find several requests. You can't tell which one corresponds to the Order Service error you're investigating.

The logs are individually correct and collectively useless for cross-service debugging. The missing ingredient is correlation: a shared ID that travels with a request through every service it touches, so you can retrieve the complete story of that request from any log aggregation system.

Structured logging as the baseline

Before correlation IDs matter, you need structured logs. Logs emitted as plain text strings are not queryable in a useful way. You can grep for error messages, but you can't filter by user_id and status_code simultaneously, or aggregate error rates by endpoint.

Structured logging means emitting JSON (or another structured format) so log aggregation systems can index and query individual fields:

{
  "timestamp": "2026-04-25T14:47:23.412Z",
  "level": "ERROR",
  "service": "order-service",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "user_id": "user-8821",
  "order_id": "order-99142",
  "message": "Inventory reservation failed",
  "error": "InventoryServiceException: item sku-774 out of stock",
  "duration_ms": 187
}

In Java, Logback with logstash-logback-encoder or Log4j2 with JsonTemplateLayout produces this format with minimal configuration. In Go, zap or zerolog emit structured JSON by default.

The fields that matter: timestamp (ISO 8601, always UTC), level, service, trace_id, message, and any domain-specific IDs relevant to the operation (user_id, order_id, payment_id).

Correlation ID propagation

A correlation ID (also called trace ID when using distributed tracing) is a unique identifier generated at the edge of your system — at the API gateway or at the first service that handles an external request. It is propagated via HTTP header through every service call downstream:

// Incoming request: extract or generate correlation ID
@Component
public class CorrelationIdFilter extends OncePerRequestFilter {
    public static final String CORRELATION_ID_HEADER = "X-Correlation-Id";

    @Override
    protected void doFilterInternal(HttpServletRequest request,
                                    HttpServletResponse response,
                                    FilterChain chain) throws IOException, ServletException {
        String correlationId = Optional
            .ofNullable(request.getHeader(CORRELATION_ID_HEADER))
            .orElse(UUID.randomUUID().toString());

        // Store in MDC so all logs in this thread include it automatically
        MDC.put("trace_id", correlationId);
        response.setHeader(CORRELATION_ID_HEADER, correlationId);

        try {
            chain.doFilter(request, response);
        } finally {
            MDC.remove("trace_id");
        }
    }
}
// Outgoing service call: forward correlation ID downstream
public class CorrelationIdInterceptor implements RequestInterceptor {
    @Override
    public void apply(RequestTemplate template) {
        String correlationId = MDC.get("trace_id");
        if (correlationId != null) {
            template.header("X-Correlation-Id", correlationId);
        }
    }
}

With this in place, every log line from every service that handles a given request includes the same trace_id. Finding all logs for a specific request becomes a single query:

# Elasticsearch/OpenSearch query
{ "query": { "term": { "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736" } } }

What to actually log

More logs are not always better. Log files that contain every method entry and exit, every variable value, and every SQL query are expensive to store, slow to query, and obscure the signals you actually need.

Log at the right level for the right information:

INFO: request received (with method, path, correlation ID), request completed (with status code, duration), significant business events (order placed, payment processed, user registered).

WARN: degraded behavior that is handled (fallback used, retry succeeded, rate limit approaching), configuration that might be wrong, deprecated API versions being called.

ERROR: operation failed in a way that requires attention, unhandled exceptions, dependency failures that triggered circuit breakers.

DEBUG: not in production by default. Enable per-service via dynamic log level adjustment (Spring Boot Actuator's /loggers endpoint) when actively debugging a specific issue.

Log aggregation and the ELK/Grafana stack

Individual service logs are useless if not centrally aggregated and queryable. Fluent Bit (lightweight log forwarder, deploys as DaemonSet on Kubernetes) collects logs from all pods and forwards them to Elasticsearch or OpenSearch. Kibana (or OpenSearch Dashboards) provides the query interface.

The alternative: Grafana Loki (designed for Kubernetes, stores logs with label-based indexing rather than full-text indexing). Loki is cheaper to operate than Elasticsearch for pure log storage, and integrates naturally with Grafana alongside metrics and traces.

Whichever stack you choose: establish a log retention policy before you need it. 30 days of raw logs from a moderately trafficked system can be several terabytes. Hot storage (fast query) for 7 days, warm storage (slower query) for 30 days, and archive for compliance requirements is a common tiering.

The correlation ID strategy is infrastructure work. Do it once, enforce it in your service template (the base configuration every new service starts from), and every new service gets it automatically. The alternative is retrofitting it into every service after you've had the debugging incident that makes you realize you needed it.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

7 Essential Insurances Every Remote Contractor Should Have

Remote contractors focus on results, not office presence. With fewer meetings and clearer scope, work moves faster and more efficiently.

Read more

Building a Network as a Remote Contractor When You Work Alone

Remote contracting is structurally isolating. Building a professional network despite that is not optional — it is one of the most consequential career investments you can make.

Read more

OpenAPI Specs: The Documentation Format Worth Getting Right From the Start

An OpenAPI spec done well is a contract, a test harness, and an SDK generator. An OpenAPI spec done poorly is a documentation burden that diverges from reality within weeks.

Read more

The Hidden Cost of Large Engineering Teams

Big teams look impressive on paper. But behind the scenes, they often move slower, cost more, and create new kinds of problems.

Read more