Timeouts in Microservices: The Setting Most Developers Never Configure
by Eric Hanson, Backend Developer at Clean Systems Consulting
The default that will hurt you in production
Most HTTP client libraries ship with either no timeout or an extremely long one. Apache HttpClient's default connection timeout is effectively infinite — it will wait until the OS TCP stack times out, which can take minutes. OkHttp's default read timeout is 10 seconds. Feign, wrapping either of these, inherits whatever the underlying client uses.
In a microservice that makes dozens of downstream HTTP calls per request, "no timeout" or "10 second timeout" means that a slow downstream service can hold your threads blocked for seconds or indefinitely. Under any meaningful load, this saturates your thread pool and takes the upstream service offline — not because of any bug, but because you accepted a library default that was not designed for your use case.
The three timeout types you need to set
Connection timeout: how long to wait for a TCP connection to be established. If the downstream service is down, the TCP SYN goes unanswered. Without a connection timeout, you wait for the OS to time out — up to 2 minutes in some configurations. Set this to 1–3 seconds. If you can't establish a TCP connection in 3 seconds, the service is unreachable.
Read timeout (socket timeout): how long to wait for data after the connection is established. This covers the case where the service is reachable but responding slowly. Set this based on your SLA requirement for that specific call. If a product detail API should respond within 200ms at the 99th percentile, a 2-second read timeout gives you a reasonable buffer while protecting against indefinite blocking.
Connection pool timeout: how long to wait for an available connection from the pool. If all connections in the pool are in use (because downstream is slow and connections are piling up), new requests queue waiting for a free connection. Set this to a short value — 500ms is often appropriate. Long queue waits add invisible latency that doesn't show up in your timeout metrics.
// Spring Boot + Apache HttpClient: explicit timeout configuration
@Bean
public CloseableHttpClient httpClient() {
return HttpClientBuilder.create()
.setConnectionManager(pooledConnectionManager())
.setDefaultRequestConfig(RequestConfig.custom()
.setConnectTimeout(Timeout.ofSeconds(2)) // TCP connect
.setResponseTimeout(Timeout.ofSeconds(5)) // read timeout
.setConnectionRequestTimeout(Timeout.ofMilliseconds(500)) // pool wait
.build())
.build();
}
The timeout budget problem
When Service A calls Service B, which calls Service C, the effective timeout is not A's timeout — it's the sum of all timeouts in the chain. If A has a 10-second timeout, B has a 10-second timeout, and C has a 10-second timeout, A might wait up to 20+ seconds for a response (A waits for B, which waits for C).
This is the timeout budget problem, and it causes callers to experience timeouts that are longer than their configured value because they didn't account for the entire call chain depth.
The solution is deadline propagation: pass the remaining time budget as a header through the call chain, and each service in the chain respects it.
// A passes its deadline to B
public Response callServiceB(Request request, Duration remainingBudget) {
return bClient.call(request,
Map.of("X-Request-Deadline",
Instant.now().plus(remainingBudget).toString()));
}
// B respects the deadline when calling C
public Response callServiceC(Request request, String deadline) {
Duration remaining = Duration.between(Instant.now(), Instant.parse(deadline));
if (remaining.isNegative()) {
throw new DeadlineExceededException("Request deadline already passed");
}
return cClient.callWithTimeout(request, remaining);
}
gRPC handles this natively with its deadline propagation mechanism. For REST-based services, you implement it via a custom header. It requires discipline to propagate consistently — missing it in one service breaks the budget for the whole chain.
Per-call versus per-service timeouts
Not all calls to a given service warrant the same timeout. A health check endpoint should timeout in 500ms. A bulk data export that legitimately takes 30 seconds should have a 35-second timeout. Setting a single timeout per service conflates these very different operations.
Configure timeouts per operation, not per service:
# Feign client with per-method configuration
feign:
client:
config:
inventory-service:
connectTimeout: 2000
readTimeout: 3000 # default for inventory calls
inventory-service#getBulkInventory:
readTimeout: 15000 # longer for bulk operations
Measuring what your timeouts should be
Timeouts should be set based on actual observed latency, not guesses. The correct approach:
- Instrument your downstream calls with latency histograms (Micrometer + Prometheus)
- Measure the 95th and 99th percentile response times over a representative period
- Set timeout at roughly 2–3x the 99th percentile — high enough to avoid false timeouts under normal conditions, low enough to fail fast under degraded conditions
# Prometheus query: 99th percentile latency for inventory service calls
histogram_quantile(0.99,
rate(http_client_requests_seconds_bucket{uri="/inventory/**"}[5m])
)
If the 99th percentile is 150ms, a 500ms timeout provides adequate buffer while still protecting against indefinite blocking. If you have no measurement data, start conservatively (2 seconds) and tighten based on what you observe in production.
Timeouts are not a one-time configuration decision. Review them when you observe P99 latency changes in dependencies, when you add new service calls, and when you change infrastructure (new region, different network topology). Stale timeout values from the system's initial design become incorrect as the system evolves.