Logs Are Useless If Nobody Reads Them
by Eric Hanson, Backend Developer at Clean Systems Consulting
The Log That Didn't Help
Your service starts returning 500 errors. You open the logs. There are thousands of lines. Some say INFO: Processing request. Some say DEBUG: Entering method. One buried line says ERROR: NullPointerException at OrderService.java:142. No context. No correlation ID. No indication of which request, which user, which input triggered the error.
You have logs. You don't have observability.
This is the most common logging failure mode: logs that record that something happened without recording enough to understand what happened or reproduce it.
The Three Purposes Logs Should Serve
Logs serve three distinct purposes, and useful logs serve all three simultaneously:
Audit trail: What happened, when, and to what? A log that records that an order was processed, with the order ID, user ID, and result, provides the audit trail that lets you answer "did this specific transaction go through?"
Debugging context: When something goes wrong, what was the state of the system? The request parameters, the intermediate state, the external calls made and their responses — all the information needed to reconstruct what happened and why.
Operational health signals: Is the system behaving normally? Successful request counts, error rates by type, processing durations — patterns that allow automated alerting and trend detection.
Logs that serve only one purpose waste the infrastructure cost of the other two.
The Structural Failures
Unstructured strings: log.info("Processing order " + orderId + " for user " + userId) produces a human-readable string. It produces an unsearchable, unqueryable string. Structured logging — outputting JSON with named fields — allows tools like Elasticsearch, Splunk, Loki, or CloudWatch Logs Insights to query by specific fields.
// Unstructured: hard to query, easy to read
log.info("Processing order {} for user {}", orderId, userId);
// Structured: queryable, filterable, dashboardable
log.info("order_processing",
kv("order_id", orderId),
kv("user_id", userId),
kv("amount", amount),
kv("currency", currency));
Missing correlation IDs: In a distributed system — or even in a single service handling multiple concurrent requests — logs without a request correlation ID are nearly impossible to trace. Every log entry for a given request should carry the same correlation ID so that all logs for that request can be retrieved as a group.
Correlation IDs should be generated at the entry point (API gateway or first service), propagated through all service calls (in HTTP headers: X-Correlation-ID or traceparent for W3C Trace Context), and included automatically in every log entry via a logging context (MDC in Java's SLF4J, contextvars in Python).
Wrong log levels: DEBUG statements that are always enabled in production flood the log store with noise and hide signal. INFO statements on every method entry are rarely useful. The signal-to-noise ratio determines whether logs are usable during an incident.
A useful heuristic for log levels:
ERROR: Something failed that requires attention. A human should look at this.WARN: Something unexpected happened, but the system recovered. Worth monitoring for trends.INFO: A significant business event occurred (order created, payment processed, user authenticated). Not every function call.DEBUG: Diagnostic information that is only useful when actively debugging a specific problem. Off by default in production.
Logging without context at the failure point: The most useful log entry is the one that occurs closest to where the failure happens, with the maximum available context. A generic ERROR: Internal server error is less useful than ERROR: Payment validation failed — card declined by provider | order_id=8472 | user_id=1234 | provider_response=insufficient_funds.
What Good Logs Look Like in Practice
A request comes in. The service logs the receipt of the request with the correlation ID and key request parameters (not full payload if it contains PII, but enough to identify the request type and scope). Each significant step in processing logs its result. If an external call is made, its latency and result are logged. On success, the outcome is logged. On failure, the error is logged with all available context at the point of failure.
When an incident occurs, you should be able to retrieve all logs for a specific request by correlation ID, see the complete sequence of events, identify where things diverged from expected behavior, and have enough context to reproduce the issue.
The Practical Takeaway
Pick one service you operate and find your most recent error in its logs. Ask: given only these logs, how long would it take to diagnose and reproduce the underlying cause? If the answer is "longer than thirty minutes," identify what context is missing from the error log — correlation ID, request parameters, intermediate state — and add it. Make that context standard in your service's error logging. Do this before the next incident.