API Gateways in Spring Boot — What They Do, When You Need One, and How to Configure Spring Cloud Gateway

by Eric Hanson, Backend Developer at Clean Systems Consulting

What an API gateway does

Without a gateway, every service in a microservices architecture handles its own authentication, rate limiting, CORS, and request routing. Adding a new cross-cutting concern — request logging, API versioning, circuit breaking — means updating every service. Clients make requests directly to individual services, requiring them to know where each service lives.

A gateway sits in front of all services and handles these concerns once:

Client → API Gateway → Order Service
                     → User Service
                     → Inventory Service
                     → Payment Service

The gateway is responsible for:

  • Routing — forwarding requests to the correct downstream service
  • Authentication — verifying tokens before forwarding requests
  • Rate limiting — enforcing per-client request limits at the edge
  • Request/response transformation — adding headers, rewriting paths, modifying bodies
  • Circuit breaking — stopping requests to failing downstream services
  • Observability — logging, metrics, and tracing for all incoming requests in one place

Each downstream service handles only its domain logic — not cross-cutting concerns.

When you don't need a gateway

A gateway adds a network hop and operational complexity. For applications with one or two backend services, a gateway is overhead without proportional benefit. The threshold where a gateway pays off:

  • Multiple frontend clients (web, mobile, third-party) with different API contract needs
  • Multiple downstream services where cross-cutting concerns would otherwise be duplicated
  • External traffic that needs authentication and rate limiting at the edge
  • Traffic shaping requirements — canary deployments, A/B testing, gradual rollouts

For a monolith or a simple service with one client, handle authentication and rate limiting in the service itself. Add a gateway when the duplication cost of per-service cross-cutting concerns exceeds the gateway's operational overhead.

Spring Cloud Gateway setup

Spring Cloud Gateway is built on Spring WebFlux (reactive) and Netty. It requires the WebFlux starter, not the regular Web starter:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
spring:
  cloud:
    gateway:
      routes:
        - id: order-service
          uri: lb://order-service          # load-balanced via service discovery
          predicates:
            - Path=/api/v1/orders/**
          filters:
            - StripPrefix=0                # don't strip the path prefix

        - id: user-service
          uri: lb://user-service
          predicates:
            - Path=/api/v1/users/**

        - id: payment-service
          uri: https://payment.internal:8080
          predicates:
            - Path=/api/v1/payments/**
          filters:
            - AddRequestHeader=X-Internal-Source, gateway

lb://order-service uses Spring Cloud LoadBalancer to resolve the service by name — works with Kubernetes service discovery, Eureka, or Consul. For direct routing, use http://order-service:8080.

Route predicates — deciding which requests match

Predicates determine whether a route applies to a request. Multiple predicates AND together — all must match:

routes:
  - id: order-service-authenticated
    uri: lb://order-service
    predicates:
      - Path=/api/v1/orders/**
      - Method=GET,POST,PUT,PATCH,DELETE
      - Header=Authorization, Bearer .+    # requires Authorization header matching regex

  - id: order-service-health
    uri: lb://order-service
    predicates:
      - Path=/api/v1/orders/health
      - Method=GET
    # No Authorization header required — health checks are public

Common predicates:

  • Path=/api/v1/orders/** — matches path patterns
  • Method=GET,POST — matches HTTP methods
  • Host=**.example.com — matches request host
  • Header=X-Request-Source, internal — matches header name and value (regex)
  • Query=debug, true — matches query parameter
  • Weight=order-service-v1, 80 — routes 80% of traffic (for canary deployments)
  • After=2026-01-01T00:00:00Z — matches requests after a specific time (for scheduled deployments)

Filters — transforming requests and responses

Gateway filters execute in a chain — each filter may modify the request before forwarding and the response after receiving it.

Built-in filters for common patterns:

filters:
  # Rewrite path: /api/v1/orders → /orders (strip version prefix internally)
  - RewritePath=/api/v1/(?<segment>.*), /${segment}

  # Add headers to requests going to downstream services
  - AddRequestHeader=X-Gateway-Version, 1.0
  - AddRequestHeader=X-Request-ID, ${requestId}

  # Remove sensitive headers before forwarding
  - RemoveRequestHeader=Cookie
  - RemoveRequestHeader=Set-Cookie

  # Add headers to responses going back to clients
  - AddResponseHeader=X-Response-Time, ${responseTime}

  # Rate limiting using Redis
  - name: RequestRateLimiter
    args:
      redis-rate-limiter.replenishRate: 100
      redis-rate-limiter.burstCapacity: 200
      key-resolver: "#{@userKeyResolver}"

  # Circuit breaker with fallback
  - name: CircuitBreaker
    args:
      name: order-service-cb
      fallbackUri: forward:/fallback/orders

Custom global filter — applies to all routes:

@Component
public class RequestLoggingFilter implements GlobalFilter, Ordered {

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        ServerHttpRequest request = exchange.getRequest();
        String requestId = UUID.randomUUID().toString();

        // Add correlation ID to outgoing request
        ServerWebExchange mutatedExchange = exchange.mutate()
            .request(r -> r.header("X-Request-ID", requestId))
            .build();

        long startTime = System.currentTimeMillis();

        return chain.filter(mutatedExchange)
            .then(Mono.fromRunnable(() -> {
                long duration = System.currentTimeMillis() - startTime;
                int statusCode = exchange.getResponse().getStatusCode() != null
                    ? exchange.getResponse().getStatusCode().value() : 0;

                log.info("method={} path={} status={} duration={}ms requestId={}",
                    request.getMethod(),
                    request.getPath(),
                    statusCode,
                    duration,
                    requestId);
            }));
    }

    @Override
    public int getOrder() {
        return Ordered.HIGHEST_PRECEDENCE;  // runs first
    }
}

Authentication at the gateway

JWT validation at the gateway means downstream services receive only authenticated requests — they don't need to implement token validation themselves:

@Component
public class JwtAuthenticationFilter implements GlobalFilter, Ordered {

    private final ReactiveJwtDecoder jwtDecoder;

    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        String path = exchange.getRequest().getPath().value();

        // Public paths bypass authentication
        if (isPublicPath(path)) {
            return chain.filter(exchange);
        }

        String authHeader = exchange.getRequest().getHeaders()
            .getFirst(HttpHeaders.AUTHORIZATION);

        if (authHeader == null || !authHeader.startsWith("Bearer ")) {
            exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
            return exchange.getResponse().setComplete();
        }

        String token = authHeader.substring(7);

        return jwtDecoder.decode(token)
            .flatMap(jwt -> {
                // Forward user identity to downstream services via headers
                ServerWebExchange mutated = exchange.mutate()
                    .request(r -> r
                        .header("X-User-ID", jwt.getSubject())
                        .header("X-User-Email", jwt.getClaimAsString("email"))
                        .header("X-User-Roles",
                            String.join(",", jwt.getClaimAsStringList("roles"))))
                    .build();
                return chain.filter(mutated);
            })
            .onErrorResume(JwtException.class, ex -> {
                exchange.getResponse().setStatusCode(HttpStatus.UNAUTHORIZED);
                return exchange.getResponse().setComplete();
            });
    }

    private boolean isPublicPath(String path) {
        return path.startsWith("/api/v1/auth/") ||
               path.equals("/actuator/health") ||
               path.startsWith("/api/v1/products"); // public catalog
    }

    @Override
    public int getOrder() {
        return -100;  // run early, before routing filters
    }
}

Downstream services read X-User-ID and X-User-Roles headers — they trust the gateway has already verified the token. This only works if downstream services are not directly accessible from outside the cluster. If services can be called directly, they must still validate tokens themselves.

Rate limiting at the edge

Gateway rate limiting with Redis prevents individual clients from overwhelming the system before requests reach downstream services:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis-reactive</artifactId>
</dependency>
@Bean
public KeyResolver userKeyResolver() {
    return exchange -> {
        // Rate limit by authenticated user ID (from header set by auth filter)
        String userId = exchange.getRequest().getHeaders().getFirst("X-User-ID");
        if (userId != null) {
            return Mono.just("user:" + userId);
        }
        // Fall back to IP for unauthenticated requests
        String ip = exchange.getRequest().getRemoteAddress() != null
            ? exchange.getRequest().getRemoteAddress().getAddress().getHostAddress()
            : "unknown";
        return Mono.just("ip:" + ip);
    };
}
spring:
  cloud:
    gateway:
      routes:
        - id: order-service
          uri: lb://order-service
          predicates:
            - Path=/api/v1/orders/**
          filters:
            - name: RequestRateLimiter
              args:
                redis-rate-limiter.replenishRate: 100   # tokens/second
                redis-rate-limiter.burstCapacity: 150   # max burst
                redis-rate-limiter.requestedTokens: 1   # cost per request
                key-resolver: "#{@userKeyResolver}"

The RequestRateLimiter filter uses a token bucket algorithm backed by Redis — the same bucket4j pattern as direct rate limiting, but implemented at the gateway for all routes simultaneously.

Circuit breaker — stopping cascading failures

spring:
  cloud:
    gateway:
      routes:
        - id: payment-service
          uri: lb://payment-service
          predicates:
            - Path=/api/v1/payments/**
          filters:
            - name: CircuitBreaker
              args:
                name: payment-circuit-breaker
                fallbackUri: forward:/api/v1/payments/fallback
                statusCodes: 500,502,503,504
@RestController
@RequestMapping("/api/v1/payments/fallback")
public class PaymentFallbackController {

    @RequestMapping
    public ResponseEntity<ErrorResponse> fallback(ServerWebExchange exchange) {
        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
            .body(new ErrorResponse(
                "payment_service_unavailable",
                "Payment service is temporarily unavailable. Please retry in a moment.",
                exchange.getRequest().getId()
            ));
    }
}

The circuit breaker configuration uses Resilience4j under the hood. Configure it in application.yml:

resilience4j:
  circuitbreaker:
    instances:
      payment-circuit-breaker:
        slidingWindowSize: 10
        failureRateThreshold: 50
        waitDurationInOpenState: 30s
        permittedNumberOfCallsInHalfOpenState: 3

Canary deployments with weighted routing

Route a percentage of traffic to a new version:

spring:
  cloud:
    gateway:
      routes:
        - id: order-service-stable
          uri: lb://order-service-v1
          predicates:
            - Path=/api/v1/orders/**
            - Weight=order-service, 90    # 90% of traffic

        - id: order-service-canary
          uri: lb://order-service-v2
          predicates:
            - Path=/api/v1/orders/**
            - Weight=order-service, 10    # 10% of traffic

Increase the order-service-canary weight gradually as confidence in v2 builds. No client-side changes required — the gateway handles the traffic split transparently.

Path rewriting for API versioning

The gateway can absorb API versioning so downstream services don't need to handle it:

routes:
  # v1 requests → internal /orders endpoint
  - id: orders-v1
    uri: lb://order-service
    predicates:
      - Path=/api/v1/orders/**
    filters:
      - RewritePath=/api/v1/orders/(?<segment>.*), /orders/${segment}

  # v2 requests → internal /v2/orders endpoint
  - id: orders-v2
    uri: lb://order-service
    predicates:
      - Path=/api/v2/orders/**
    filters:
      - RewritePath=/api/v2/orders/(?<segment>.*), /v2/orders/${segment}

Downstream services expose /orders and /v2/orders. The gateway maps /api/v1/orders and /api/v2/orders from the public API to these internal paths.

Gateway observability

Spring Cloud Gateway integrates with Micrometer automatically — request count, response time, and status codes are tracked per route:

management:
  metrics:
    tags:
      application: api-gateway
    distribution:
      percentiles-histogram:
        spring.cloud.gateway.requests: true
      percentiles:
        spring.cloud.gateway.requests: 0.5, 0.95, 0.99

Metrics available:

  • spring.cloud.gateway.requests — request count and latency by route, status code
  • spring.cloud.gateway.requests.active — currently active requests per route

Alert on:

  • p99 gateway latency increasing without corresponding downstream latency increase — indicates gateway overhead
  • spring.cloud.gateway.requests{status=502} rate — upstream services returning errors
  • spring.cloud.gateway.requests{status=429} rate — rate limiting is being applied; high rates indicate attack or misconfigured client

The deployment model that works

In Kubernetes, the gateway runs as a deployment with a public-facing load balancer service:

apiVersion: v1
kind: Service
metadata:
  name: api-gateway
spec:
  type: LoadBalancer         # public-facing
  selector:
    app: api-gateway
  ports:
    - port: 443
      targetPort: 8080
---
# All downstream services use ClusterIP — not directly accessible externally
apiVersion: v1
kind: Service
metadata:
  name: order-service
spec:
  type: ClusterIP            # internal only
  selector:
    app: order-service
  ports:
    - port: 8080

ClusterIP services are only reachable from within the cluster — the gateway is the only way in from outside. This enforces the routing topology in the network layer, not just in configuration. Clients cannot bypass the gateway by calling service URLs directly.

The gateway itself should have at least 2–3 replicas — it's the single point of entry for all traffic, making it the highest-impact service to lose.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

The Best Ways to Organize Your Freelance Workflow

Freelancing can feel like juggling a dozen balls while riding a unicycle. With the right workflow, you can keep everything moving smoothly—and stay sane.

Read more

API Versioning Is Not Optional Once You Have Real Users

Once an API has real consumers, every change becomes a contract risk. Versioning is the only reliable way to evolve safely without breaking production systems.

Read more

Lisbon Is No Longer the Affordable Tech Hub It Used to Be — Here Is What Startups Do Now

Lisbon built its reputation as a place where startups could hire well without spending like San Francisco. That window has mostly closed.

Read more

Virtual Threads in Java — What Changes, What Doesn't, and How to Migrate

Virtual threads are production-ready in Java 21 and change the scalability profile of I/O-bound Java services without requiring reactive programming. Here is the precise model, the traps, and a migration checklist.

Read more