API Gateways in Spring Boot — What They Do, When You Need One, and How to Configure Spring Cloud Gateway

February 3, 2026

by Arif Ikhsanudin, Backend Developer

What an API gateway does

Without a gateway, every service in a microservices architecture handles its own authentication, rate limiting, CORS, and request routing. Adding a new cross-cutting concern — request logging, API versioning, circuit breaking — means updating every service. Clients make requests directly to individual services, requiring them to know where each service lives.

A gateway sits in front of all services and handles these concerns once:

Client → API Gateway → Order Service
                     → User Service
                     → Inventory Service
                     → Payment Service

The gateway is responsible for:

Routing — forwarding requests to the correct downstream service
Authentication — verifying tokens before forwarding requests
Rate limiting — enforcing per-client request limits at the edge
Request/response transformation — adding headers, rewriting paths, modifying bodies
Circuit breaking — stopping requests to failing downstream services
Observability — logging, metrics, and tracing for all incoming requests in one place

Each downstream service handles only its domain logic — not cross-cutting concerns.

When you don't need a gateway

A gateway adds a network hop and operational complexity. For applications with one or two backend services, a gateway is overhead without proportional benefit. The threshold where a gateway pays off:

Multiple frontend clients (web, mobile, third-party) with different API contract needs
Multiple downstream services where cross-cutting concerns would otherwise be duplicated
External traffic that needs authentication and rate limiting at the edge
Traffic shaping requirements — canary deployments, A/B testing, gradual rollouts

For a monolith or a simple service with one client, handle authentication and rate limiting in the service itself. Add a gateway when the duplication cost of per-service cross-cutting concerns exceeds the gateway's operational overhead.

Spring Cloud Gateway setup

Spring Cloud Gateway is built on Spring WebFlux (reactive) and Netty. It requires the WebFlux starter, not the regular Web starter:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>

spring:
  cloud:
    gateway:
      routes:
        - id: order-service
          uri: lb://order-service          # load-balanced via service discovery
          predicates:
            - Path=/api/v1/orders/**
          filters:
            - StripPrefix=0                # don't strip the path prefix

        - id: user-service
          uri: lb://user-service
          predicates:
            - Path=/api/v1/users/**

        - id: payment-service
          uri: https://payment.internal:8080
          predicates:
            - Path=/api/v1/payments/**
          filters:
            - AddRequestHeader=X-Internal-Source, gateway

lb://order-service uses Spring Cloud LoadBalancer to resolve the service by name — works with Kubernetes service discovery, Eureka, or Consul. For direct routing, use http://order-service:8080.

Route predicates — deciding which requests match

Predicates determine whether a route applies to a request. Multiple predicates AND together — all must match:

routes:
  - id: order-service-authenticated
    uri: lb://order-service
    predicates:
      - Path=/api/v1/orders/**
      - Method=GET,POST,PUT,PATCH,DELETE
      - Header=Authorization, Bearer .+    # requires Authorization header matching regex

  - id: order-service-health
    uri: lb://order-service
    predicates:
      - Path=/api/v1/orders/health
      - Method=GET
    # No Authorization header required — health checks are public

Common predicates:

Path=/api/v1/orders/** — matches path patterns
Method=GET,POST — matches HTTP methods
Host=**.example.com — matches request host
Header=X-Request-Source, internal — matches header name and value (regex)
Query=debug, true — matches query parameter
Weight=order-service-v1, 80 — routes 80% of traffic (for canary deployments)
After=2026-01-01T00:00:00Z — matches requests after a specific time (for scheduled deployments)

Filters — transforming requests and responses

Gateway filters execute in a chain — each filter may modify the request before forwarding and the response after receiving it.

Built-in filters for common patterns:

filters:
  # Rewrite path: /api/v1/orders → /orders (strip version prefix internally)
  - RewritePath=/api/v1/(?<segment>.*), /${segment}

  # Add headers to requests going to downstream services
  - AddRequestHeader=X-Gateway-Version, 1.0
  - AddRequestHeader=X-Request-ID, ${requestId}

  # Remove sensitive headers before forwarding
  - RemoveRequestHeader=Cookie
  - RemoveRequestHeader=Set-Cookie

  # Add headers to responses going back to clients
  - AddResponseHeader=X-Response-Time, ${responseTime}

  # Rate limiting using Redis
  - name: RequestRateLimiter
    args:
      redis-rate-limiter.replenishRate: 100
      redis-rate-limiter.burstCapacity: 200
      key-resolver: "#{@userKeyResolver}"

  # Circuit breaker with fallback
  - name: CircuitBreaker
    args:
      name: order-service-cb
      fallbackUri: forward:/fallback/orders

Custom global filter — applies to all routes:

@Component
public class RequestLoggingFilter implements GlobalFilter, Ordered {

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        ServerHttpRequest request = exchange.getRequest();
        String requestId = UUID.randomUUID().toString();

        // Add correlation ID to outgoing request
        ServerWebExchange mutatedExchange = exchange.mutate()
            .request(r -> r.header("X-Request-ID", requestId))
            .build();

        long startTime = System.currentTimeMillis();

        return chain.filter(mutatedExchange)
            .then(Mono.fromRunnable(() -> {
                long duration = System.currentTimeMillis() - startTime;
                int statusCode = exchange.getResponse().getStatusCode() != null
                    ? exchange.getResponse().getStatusCode().value() : 0;

                log.info("method={} path={} status={} duration={}ms requestId={}",
                    request.getMethod(),
                    request.getPath(),
                    statusCode,
                    duration,
                    requestId);
            }));
    }

    @Override
    public int getOrder() {
        return Ordered.HIGHEST_PRECEDENCE;  // runs first
    }
}

Authentication at the gateway

JWT validation at the gateway means downstream services receive only authenticated requests — they don't need to implement token validation themselves:

@Component
public class JwtAuthenticationFilter implements GlobalFilter, Ordered {

    private final ReactiveJwtDecoder jwtDecoder;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String path = exchange.getRequest().getPath().value();

        // Public paths bypass authentication
        if (isPublicPath(path)) {
            return chain.filter(exchange);
        }

        String authHeader = exchange.getRequest().getHeaders()
            .getFirst(HttpHeaders.AUTHORIZATION);

        if (authHeader == null || !authHeader.startsWith("Bearer ")) {
            exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
            return exchange.getResponse().setComplete();
        }

        String token = authHeader.substring(7);

        return jwtDecoder.decode(token)
            .flatMap(jwt -> {
                // Forward user identity to downstream services via headers
                ServerWebExchange mutated = exchange.mutate()
                    .request(r -> r
                        .header("X-User-ID", jwt.getSubject())
                        .header("X-User-Email", jwt.getClaimAsString("email"))
                        .header("X-User-Roles",
                            String.join(",", jwt.getClaimAsStringList("roles"))))
                    .build();
                return chain.filter(mutated);
            })
            .onErrorResume(JwtException.class, ex -> {
                exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
                return exchange.getResponse().setComplete();
            });
    }

    private boolean isPublicPath(String path) {
        return path.startsWith("/api/v1/auth/") ||
               path.equals("/actuator/health") ||
               path.startsWith("/api/v1/products"); // public catalog
    }

    @Override
    public int getOrder() {
        return -100;  // run early, before routing filters
    }
}

Downstream services read X-User-ID and X-User-Roles headers — they trust the gateway has already verified the token. This only works if downstream services are not directly accessible from outside the cluster. If services can be called directly, they must still validate tokens themselves.

Rate limiting at the edge

Gateway rate limiting with Redis prevents individual clients from overwhelming the system before requests reach downstream services:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>

@Bean
public KeyResolver userKeyResolver() {
    return exchange -> {
        // Rate limit by authenticated user ID (from header set by auth filter)
        String userId = exchange.getRequest().getHeaders().getFirst("X-User-ID");
        if (userId != null) {
            return Mono.just("user:" + userId);
        }
        // Fall back to IP for unauthenticated requests
        String ip = exchange.getRequest().getRemoteAddress() != null
            ? exchange.getRequest().getRemoteAddress().getAddress().getHostAddress()
            : "unknown";
        return Mono.just("ip:" + ip);
    };
}

spring:
  cloud:
    gateway:
      routes:
        - id: order-service
          uri: lb://order-service
          predicates:
            - Path=/api/v1/orders/**
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 100   # tokens/second
                redis-rate-limiter.burstCapacity: 150   # max burst
                redis-rate-limiter.requestedTokens: 1   # cost per request
                key-resolver: "#{@userKeyResolver}"

The RequestRateLimiter filter uses a token bucket algorithm backed by Redis — the same bucket4j pattern as direct rate limiting, but implemented at the gateway for all routes simultaneously.

Circuit breaker — stopping cascading failures

spring:
  cloud:
    gateway:
      routes:
        - id: payment-service
          uri: lb://payment-service
          predicates:
            - Path=/api/v1/payments/**
          filters:
            - name: CircuitBreaker
              args:
                name: payment-circuit-breaker
                fallbackUri: forward:/api/v1/payments/fallback
                statusCodes: 500,502,503,504

@RestController
@RequestMapping("/api/v1/payments/fallback")
public class PaymentFallbackController {

    @RequestMapping
    public ResponseEntity<ErrorResponse> fallback(ServerWebExchange exchange) {
        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
            .body(new ErrorResponse(
                "payment_service_unavailable",
                "Payment service is temporarily unavailable. Please retry in a moment.",
                exchange.getRequest().getId()
            ));
    }
}

The circuit breaker configuration uses Resilience4j under the hood. Configure it in application.yml:

resilience4j:
  circuitbreaker:
    instances:
      payment-circuit-breaker:
        slidingWindowSize: 10
        failureRateThreshold: 50
        waitDurationInOpenState: 30s
        permittedNumberOfCallsInHalfOpenState: 3

Canary deployments with weighted routing

Route a percentage of traffic to a new version:

spring:
  cloud:
    gateway:
      routes:
        - id: order-service-stable
          uri: lb://order-service-v1
          predicates:
            - Path=/api/v1/orders/**
            - Weight=order-service, 90    # 90% of traffic

        - id: order-service-canary
          uri: lb://order-service-v2
          predicates:
            - Path=/api/v1/orders/**
            - Weight=order-service, 10    # 10% of traffic

Increase the order-service-canary weight gradually as confidence in v2 builds. No client-side changes required — the gateway handles the traffic split transparently.

Path rewriting for API versioning

The gateway can absorb API versioning so downstream services don't need to handle it:

routes:
  # v1 requests → internal /orders endpoint
  - id: orders-v1
    uri: lb://order-service
    predicates:
      - Path=/api/v1/orders/**
    filters:
      - RewritePath=/api/v1/orders/(?<segment>.*), /orders/${segment}

  # v2 requests → internal /v2/orders endpoint
  - id: orders-v2
    uri: lb://order-service
    predicates:
      - Path=/api/v2/orders/**
    filters:
      - RewritePath=/api/v2/orders/(?<segment>.*), /v2/orders/${segment}

Downstream services expose /orders and /v2/orders. The gateway maps /api/v1/orders and /api/v2/orders from the public API to these internal paths.

Gateway observability

Spring Cloud Gateway integrates with Micrometer automatically — request count, response time, and status codes are tracked per route:

management:
  metrics:
    tags:
      application: api-gateway
    distribution:
      percentiles-histogram:
        spring.cloud.gateway.requests: true
      percentiles:
        spring.cloud.gateway.requests: 0.5, 0.95, 0.99

Metrics available:

spring.cloud.gateway.requests — request count and latency by route, status code
spring.cloud.gateway.requests.active — currently active requests per route

Alert on:

p99 gateway latency increasing without corresponding downstream latency increase — indicates gateway overhead
spring.cloud.gateway.requests{status=502} rate — upstream services returning errors
spring.cloud.gateway.requests{status=429} rate — rate limiting is being applied; high rates indicate attack or misconfigured client

The deployment model that works

In Kubernetes, the gateway runs as a deployment with a public-facing load balancer service:

apiVersion: v1
kind: Service
metadata:
  name: api-gateway
spec:
  type: LoadBalancer         # public-facing
  selector:
    app: api-gateway
  ports:
    - port: 443
      targetPort: 8080
---
# All downstream services use ClusterIP — not directly accessible externally
apiVersion: v1
kind: Service
metadata:
  name: order-service
spec:
  type: ClusterIP            # internal only
  selector:
    app: order-service
  ports:
    - port: 8080

ClusterIP services are only reachable from within the cluster — the gateway is the only way in from outside. This enforces the routing topology in the network layer, not just in configuration. Clients cannot bypass the gateway by calling service URLs directly.

The gateway itself should have at least 2–3 replicas — it's the single point of entry for all traffic, making it the highest-impact service to lose.

Our offices

Follow us

API Gateways in Spring Boot — What They Do, When You Need One, and How to Configure Spring Cloud Gateway

What an API gateway does

When you don't need a gateway

Spring Cloud Gateway setup

Route predicates — deciding which requests match

Filters — transforming requests and responses

Authentication at the gateway

Rate limiting at the edge

Circuit breaker — stopping cascading failures

Canary deployments with weighted routing

Path rewriting for API versioning

Gateway observability

The deployment model that works

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Ruby on Rails vs Spring Boot — How I Choose for a New Project

Stop Losing Data When Your Container Restarts

Stop Writing Subqueries When a JOIN Will Do

ActiveRecord Query Patterns That Actually Scale