API Gateways in Spring Boot — What They Do, When You Need One, and How to Configure Spring Cloud Gateway
by Eric Hanson, Backend Developer at Clean Systems Consulting
What an API gateway does
Without a gateway, every service in a microservices architecture handles its own authentication, rate limiting, CORS, and request routing. Adding a new cross-cutting concern — request logging, API versioning, circuit breaking — means updating every service. Clients make requests directly to individual services, requiring them to know where each service lives.
A gateway sits in front of all services and handles these concerns once:
Client → API Gateway → Order Service
→ User Service
→ Inventory Service
→ Payment Service
The gateway is responsible for:
- Routing — forwarding requests to the correct downstream service
- Authentication — verifying tokens before forwarding requests
- Rate limiting — enforcing per-client request limits at the edge
- Request/response transformation — adding headers, rewriting paths, modifying bodies
- Circuit breaking — stopping requests to failing downstream services
- Observability — logging, metrics, and tracing for all incoming requests in one place
Each downstream service handles only its domain logic — not cross-cutting concerns.
When you don't need a gateway
A gateway adds a network hop and operational complexity. For applications with one or two backend services, a gateway is overhead without proportional benefit. The threshold where a gateway pays off:
- Multiple frontend clients (web, mobile, third-party) with different API contract needs
- Multiple downstream services where cross-cutting concerns would otherwise be duplicated
- External traffic that needs authentication and rate limiting at the edge
- Traffic shaping requirements — canary deployments, A/B testing, gradual rollouts
For a monolith or a simple service with one client, handle authentication and rate limiting in the service itself. Add a gateway when the duplication cost of per-service cross-cutting concerns exceeds the gateway's operational overhead.
Spring Cloud Gateway setup
Spring Cloud Gateway is built on Spring WebFlux (reactive) and Netty. It requires the WebFlux starter, not the regular Web starter:
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
spring:
cloud:
gateway:
routes:
- id: order-service
uri: lb://order-service # load-balanced via service discovery
predicates:
- Path=/api/v1/orders/**
filters:
- StripPrefix=0 # don't strip the path prefix
- id: user-service
uri: lb://user-service
predicates:
- Path=/api/v1/users/**
- id: payment-service
uri: https://payment.internal:8080
predicates:
- Path=/api/v1/payments/**
filters:
- AddRequestHeader=X-Internal-Source, gateway
lb://order-service uses Spring Cloud LoadBalancer to resolve the service by name — works with Kubernetes service discovery, Eureka, or Consul. For direct routing, use http://order-service:8080.
Route predicates — deciding which requests match
Predicates determine whether a route applies to a request. Multiple predicates AND together — all must match:
routes:
- id: order-service-authenticated
uri: lb://order-service
predicates:
- Path=/api/v1/orders/**
- Method=GET,POST,PUT,PATCH,DELETE
- Header=Authorization, Bearer .+ # requires Authorization header matching regex
- id: order-service-health
uri: lb://order-service
predicates:
- Path=/api/v1/orders/health
- Method=GET
# No Authorization header required — health checks are public
Common predicates:
Path=/api/v1/orders/**— matches path patternsMethod=GET,POST— matches HTTP methodsHost=**.example.com— matches request hostHeader=X-Request-Source, internal— matches header name and value (regex)Query=debug, true— matches query parameterWeight=order-service-v1, 80— routes 80% of traffic (for canary deployments)After=2026-01-01T00:00:00Z— matches requests after a specific time (for scheduled deployments)
Filters — transforming requests and responses
Gateway filters execute in a chain — each filter may modify the request before forwarding and the response after receiving it.
Built-in filters for common patterns:
filters:
# Rewrite path: /api/v1/orders → /orders (strip version prefix internally)
- RewritePath=/api/v1/(?<segment>.*), /${segment}
# Add headers to requests going to downstream services
- AddRequestHeader=X-Gateway-Version, 1.0
- AddRequestHeader=X-Request-ID, ${requestId}
# Remove sensitive headers before forwarding
- RemoveRequestHeader=Cookie
- RemoveRequestHeader=Set-Cookie
# Add headers to responses going back to clients
- AddResponseHeader=X-Response-Time, ${responseTime}
# Rate limiting using Redis
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 100
redis-rate-limiter.burstCapacity: 200
key-resolver: "#{@userKeyResolver}"
# Circuit breaker with fallback
- name: CircuitBreaker
args:
name: order-service-cb
fallbackUri: forward:/fallback/orders
Custom global filter — applies to all routes:
@Component
public class RequestLoggingFilter implements GlobalFilter, Ordered {
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
ServerHttpRequest request = exchange.getRequest();
String requestId = UUID.randomUUID().toString();
// Add correlation ID to outgoing request
ServerWebExchange mutatedExchange = exchange.mutate()
.request(r -> r.header("X-Request-ID", requestId))
.build();
long startTime = System.currentTimeMillis();
return chain.filter(mutatedExchange)
.then(Mono.fromRunnable(() -> {
long duration = System.currentTimeMillis() - startTime;
int statusCode = exchange.getResponse().getStatusCode() != null
? exchange.getResponse().getStatusCode().value() : 0;
log.info("method={} path={} status={} duration={}ms requestId={}",
request.getMethod(),
request.getPath(),
statusCode,
duration,
requestId);
}));
}
@Override
public int getOrder() {
return Ordered.HIGHEST_PRECEDENCE; // runs first
}
}
Authentication at the gateway
JWT validation at the gateway means downstream services receive only authenticated requests — they don't need to implement token validation themselves:
@Component
public class JwtAuthenticationFilter implements GlobalFilter, Ordered {
private final ReactiveJwtDecoder jwtDecoder;
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String path = exchange.getRequest().getPath().value();
// Public paths bypass authentication
if (isPublicPath(path)) {
return chain.filter(exchange);
}
String authHeader = exchange.getRequest().getHeaders()
.getFirst(HttpHeaders.AUTHORIZATION);
if (authHeader == null || !authHeader.startsWith("Bearer ")) {
exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
return exchange.getResponse().setComplete();
}
String token = authHeader.substring(7);
return jwtDecoder.decode(token)
.flatMap(jwt -> {
// Forward user identity to downstream services via headers
ServerWebExchange mutated = exchange.mutate()
.request(r -> r
.header("X-User-ID", jwt.getSubject())
.header("X-User-Email", jwt.getClaimAsString("email"))
.header("X-User-Roles",
String.join(",", jwt.getClaimAsStringList("roles"))))
.build();
return chain.filter(mutated);
})
.onErrorResume(JwtException.class, ex -> {
exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
return exchange.getResponse().setComplete();
});
}
private boolean isPublicPath(String path) {
return path.startsWith("/api/v1/auth/") ||
path.equals("/actuator/health") ||
path.startsWith("/api/v1/products"); // public catalog
}
@Override
public int getOrder() {
return -100; // run early, before routing filters
}
}
Downstream services read X-User-ID and X-User-Roles headers — they trust the gateway has already verified the token. This only works if downstream services are not directly accessible from outside the cluster. If services can be called directly, they must still validate tokens themselves.
Rate limiting at the edge
Gateway rate limiting with Redis prevents individual clients from overwhelming the system before requests reach downstream services:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>
@Bean
public KeyResolver userKeyResolver() {
return exchange -> {
// Rate limit by authenticated user ID (from header set by auth filter)
String userId = exchange.getRequest().getHeaders().getFirst("X-User-ID");
if (userId != null) {
return Mono.just("user:" + userId);
}
// Fall back to IP for unauthenticated requests
String ip = exchange.getRequest().getRemoteAddress() != null
? exchange.getRequest().getRemoteAddress().getAddress().getHostAddress()
: "unknown";
return Mono.just("ip:" + ip);
};
}
spring:
cloud:
gateway:
routes:
- id: order-service
uri: lb://order-service
predicates:
- Path=/api/v1/orders/**
filters:
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 100 # tokens/second
redis-rate-limiter.burstCapacity: 150 # max burst
redis-rate-limiter.requestedTokens: 1 # cost per request
key-resolver: "#{@userKeyResolver}"
The RequestRateLimiter filter uses a token bucket algorithm backed by Redis — the same bucket4j pattern as direct rate limiting, but implemented at the gateway for all routes simultaneously.
Circuit breaker — stopping cascading failures
spring:
cloud:
gateway:
routes:
- id: payment-service
uri: lb://payment-service
predicates:
- Path=/api/v1/payments/**
filters:
- name: CircuitBreaker
args:
name: payment-circuit-breaker
fallbackUri: forward:/api/v1/payments/fallback
statusCodes: 500,502,503,504
@RestController
@RequestMapping("/api/v1/payments/fallback")
public class PaymentFallbackController {
@RequestMapping
public ResponseEntity<ErrorResponse> fallback(ServerWebExchange exchange) {
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(new ErrorResponse(
"payment_service_unavailable",
"Payment service is temporarily unavailable. Please retry in a moment.",
exchange.getRequest().getId()
));
}
}
The circuit breaker configuration uses Resilience4j under the hood. Configure it in application.yml:
resilience4j:
circuitbreaker:
instances:
payment-circuit-breaker:
slidingWindowSize: 10
failureRateThreshold: 50
waitDurationInOpenState: 30s
permittedNumberOfCallsInHalfOpenState: 3
Canary deployments with weighted routing
Route a percentage of traffic to a new version:
spring:
cloud:
gateway:
routes:
- id: order-service-stable
uri: lb://order-service-v1
predicates:
- Path=/api/v1/orders/**
- Weight=order-service, 90 # 90% of traffic
- id: order-service-canary
uri: lb://order-service-v2
predicates:
- Path=/api/v1/orders/**
- Weight=order-service, 10 # 10% of traffic
Increase the order-service-canary weight gradually as confidence in v2 builds. No client-side changes required — the gateway handles the traffic split transparently.
Path rewriting for API versioning
The gateway can absorb API versioning so downstream services don't need to handle it:
routes:
# v1 requests → internal /orders endpoint
- id: orders-v1
uri: lb://order-service
predicates:
- Path=/api/v1/orders/**
filters:
- RewritePath=/api/v1/orders/(?<segment>.*), /orders/${segment}
# v2 requests → internal /v2/orders endpoint
- id: orders-v2
uri: lb://order-service
predicates:
- Path=/api/v2/orders/**
filters:
- RewritePath=/api/v2/orders/(?<segment>.*), /v2/orders/${segment}
Downstream services expose /orders and /v2/orders. The gateway maps /api/v1/orders and /api/v2/orders from the public API to these internal paths.
Gateway observability
Spring Cloud Gateway integrates with Micrometer automatically — request count, response time, and status codes are tracked per route:
management:
metrics:
tags:
application: api-gateway
distribution:
percentiles-histogram:
spring.cloud.gateway.requests: true
percentiles:
spring.cloud.gateway.requests: 0.5, 0.95, 0.99
Metrics available:
spring.cloud.gateway.requests— request count and latency by route, status codespring.cloud.gateway.requests.active— currently active requests per route
Alert on:
- p99 gateway latency increasing without corresponding downstream latency increase — indicates gateway overhead
spring.cloud.gateway.requests{status=502}rate — upstream services returning errorsspring.cloud.gateway.requests{status=429}rate — rate limiting is being applied; high rates indicate attack or misconfigured client
The deployment model that works
In Kubernetes, the gateway runs as a deployment with a public-facing load balancer service:
apiVersion: v1
kind: Service
metadata:
name: api-gateway
spec:
type: LoadBalancer # public-facing
selector:
app: api-gateway
ports:
- port: 443
targetPort: 8080
---
# All downstream services use ClusterIP — not directly accessible externally
apiVersion: v1
kind: Service
metadata:
name: order-service
spec:
type: ClusterIP # internal only
selector:
app: order-service
ports:
- port: 8080
ClusterIP services are only reachable from within the cluster — the gateway is the only way in from outside. This enforces the routing topology in the network layer, not just in configuration. Clients cannot bypass the gateway by calling service URLs directly.
The gateway itself should have at least 2–3 replicas — it's the single point of entry for all traffic, making it the highest-impact service to lose.