Timeouts in Microservices: The Setting Most Developers Never Configure

by Eric Hanson, Backend Developer at Clean Systems Consulting

The default that will hurt you in production

Most HTTP client libraries ship with either no timeout or an extremely long one. Apache HttpClient's default connection timeout is effectively infinite — it will wait until the OS TCP stack times out, which can take minutes. OkHttp's default read timeout is 10 seconds. Feign, wrapping either of these, inherits whatever the underlying client uses.

In a microservice that makes dozens of downstream HTTP calls per request, "no timeout" or "10 second timeout" means that a slow downstream service can hold your threads blocked for seconds or indefinitely. Under any meaningful load, this saturates your thread pool and takes the upstream service offline — not because of any bug, but because you accepted a library default that was not designed for your use case.

The three timeout types you need to set

Connection timeout: how long to wait for a TCP connection to be established. If the downstream service is down, the TCP SYN goes unanswered. Without a connection timeout, you wait for the OS to time out — up to 2 minutes in some configurations. Set this to 1–3 seconds. If you can't establish a TCP connection in 3 seconds, the service is unreachable.

Read timeout (socket timeout): how long to wait for data after the connection is established. This covers the case where the service is reachable but responding slowly. Set this based on your SLA requirement for that specific call. If a product detail API should respond within 200ms at the 99th percentile, a 2-second read timeout gives you a reasonable buffer while protecting against indefinite blocking.

Connection pool timeout: how long to wait for an available connection from the pool. If all connections in the pool are in use (because downstream is slow and connections are piling up), new requests queue waiting for a free connection. Set this to a short value — 500ms is often appropriate. Long queue waits add invisible latency that doesn't show up in your timeout metrics.

// Spring Boot + Apache HttpClient: explicit timeout configuration
@Bean
public CloseableHttpClient httpClient() {
    return HttpClientBuilder.create()
        .setConnectionManager(pooledConnectionManager())
        .setDefaultRequestConfig(RequestConfig.custom()
            .setConnectTimeout(Timeout.ofSeconds(2))       // TCP connect
            .setResponseTimeout(Timeout.ofSeconds(5))      // read timeout
            .setConnectionRequestTimeout(Timeout.ofMilliseconds(500)) // pool wait
            .build())
        .build();
}

The timeout budget problem

When Service A calls Service B, which calls Service C, the effective timeout is not A's timeout — it's the sum of all timeouts in the chain. If A has a 10-second timeout, B has a 10-second timeout, and C has a 10-second timeout, A might wait up to 20+ seconds for a response (A waits for B, which waits for C).

This is the timeout budget problem, and it causes callers to experience timeouts that are longer than their configured value because they didn't account for the entire call chain depth.

The solution is deadline propagation: pass the remaining time budget as a header through the call chain, and each service in the chain respects it.

// A passes its deadline to B
public Response callServiceB(Request request, Duration remainingBudget) {
    return bClient.call(request, 
        Map.of("X-Request-Deadline", 
               Instant.now().plus(remainingBudget).toString()));
}

// B respects the deadline when calling C
public Response callServiceC(Request request, String deadline) {
    Duration remaining = Duration.between(Instant.now(), Instant.parse(deadline));
    if (remaining.isNegative()) {
        throw new DeadlineExceededException("Request deadline already passed");
    }
    return cClient.callWithTimeout(request, remaining);
}

gRPC handles this natively with its deadline propagation mechanism. For REST-based services, you implement it via a custom header. It requires discipline to propagate consistently — missing it in one service breaks the budget for the whole chain.

Per-call versus per-service timeouts

Not all calls to a given service warrant the same timeout. A health check endpoint should timeout in 500ms. A bulk data export that legitimately takes 30 seconds should have a 35-second timeout. Setting a single timeout per service conflates these very different operations.

Configure timeouts per operation, not per service:

# Feign client with per-method configuration
feign:
  client:
    config:
      inventory-service:
        connectTimeout: 2000
        readTimeout: 3000    # default for inventory calls
      inventory-service#getBulkInventory:
        readTimeout: 15000   # longer for bulk operations

Measuring what your timeouts should be

Timeouts should be set based on actual observed latency, not guesses. The correct approach:

  1. Instrument your downstream calls with latency histograms (Micrometer + Prometheus)
  2. Measure the 95th and 99th percentile response times over a representative period
  3. Set timeout at roughly 2–3x the 99th percentile — high enough to avoid false timeouts under normal conditions, low enough to fail fast under degraded conditions
# Prometheus query: 99th percentile latency for inventory service calls
histogram_quantile(0.99, 
  rate(http_client_requests_seconds_bucket{uri="/inventory/**"}[5m])
)

If the 99th percentile is 150ms, a 500ms timeout provides adequate buffer while still protecting against indefinite blocking. If you have no measurement data, start conservatively (2 seconds) and tighten based on what you observe in production.

Timeouts are not a one-time configuration decision. Review them when you observe P99 latency changes in dependencies, when you add new service calls, and when you change infrastructure (new region, different network topology). Stale timeout values from the system's initial design become incorrect as the system evolves.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

From Figma to API: A Structured Backend Development Process

You got a gorgeous Figma design and think, “Easy, backend can just follow this, right?” Not so fast. Without a clear technical plan, even perfect screens can lead to messy APIs.

Read more

How to Model Relationships in SQL Without Regretting It Later

One-to-many and many-to-many relationships have well-established SQL patterns — the mistakes come from choosing the wrong pattern, modeling implicit relationships without foreign keys, or reaching for polymorphic associations when a concrete schema would serve better.

Read more

What Really Happens Inside a Java HashMap

HashMap is the most-used data structure in Java and one of the least understood internally. The hash function, bucket structure, tree conversion, and resize behavior all have practical consequences for performance and correctness.

Read more

Bind Mounts vs Volumes in Docker: Which One Should You Use

Bind mounts and named volumes both attach external storage to containers, but they solve different problems. Choosing the wrong one leads to permission issues, performance problems, and data in the wrong place.

Read more