Spring Boot API Rate Limiting — rack-attack Equivalent in Java

March 16, 2026

by Arif Ikhsanudin, Backend Developer

What to rate limit and why

Rate limiting serves three distinct purposes that require different configurations:

Abuse prevention: stop a single client from consuming disproportionate resources. An API key making 10,000 requests per second is either a bug or an attack. Limit it to 1,000 per minute.

Fair usage enforcement: ensure all customers get reasonable access. One customer's heavy usage shouldn't degrade performance for others. Per-customer limits with burst allowance cover this.

Infrastructure protection: prevent the API from being overwhelmed regardless of intent. Global limits on expensive endpoints (full-text search, export, bulk operations) protect the database even when many legitimate users hit them simultaneously.

Each purpose may require different limits, different keys (IP vs API key vs user ID vs endpoint), and different responses (429 with retry-after vs graceful degradation).

Bucket4j — the token bucket algorithm

The token bucket algorithm is the standard for API rate limiting. Each client has a bucket that holds a maximum number of tokens (the burst capacity). Tokens are added at a fixed rate (the refill rate). Each request consumes one token. When the bucket is empty, the request is rejected.

This allows burst traffic (a client can use accumulated tokens quickly) while enforcing an average rate over time. A limit of "100 requests per minute with burst of 20" means a client can make 20 requests instantly, then 100 per minute thereafter.

Bucket4j implements the token bucket algorithm for Java with Redis, Hazelcast, and in-memory backends:

<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-redis</artifactId>
    <version>8.10.1</version>
</dependency>
<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-spring-boot-starter</artifactId>
    <version>8.10.1</version>
</dependency>

Per-API-key rate limiting with Redis

Redis stores the bucket state — all application instances share the same rate limit counters, enabling correct distributed rate limiting:

@Configuration
public class RateLimitConfig {

    @Bean
    public ProxyManager<String> proxyManager(RedissonClient redissonClient) {
        return Bucket4jRedisson.casBasedBuilder(redissonClient).build();
    }

    @Bean
    public BucketConfiguration apiKeyBucketConfiguration() {
        return BucketConfiguration.builder()
            .addLimit(Bandwidth.builder()
                .capacity(1000)                        // max 1000 tokens (burst)
                .refillGreedy(1000, Duration.ofMinutes(1))  // refill 1000/minute
                .build())
            .addLimit(Bandwidth.builder()
                .capacity(100)                         // inner limit: 100 tokens
                .refillGreedy(100, Duration.ofSeconds(10))  // refill 100 per 10s
                .build())
            .build();
    }
}

Two bandwidth limits compose — both must pass. The outer limit (1000/minute) prevents sustained overconsumption. The inner limit (100/10 seconds) prevents burst attacks that would consume the full minute allowance in seconds.

@Component
public class RateLimitingFilter extends OncePerRequestFilter {

    private final ProxyManager<String> proxyManager;
    private final BucketConfiguration bucketConfiguration;

    @Override
    protected void doFilterInternal(HttpServletRequest request,
            HttpServletResponse response, FilterChain chain)
            throws ServletException, IOException {

        String rateLimitKey = extractRateLimitKey(request);
        Bucket bucket = proxyManager.builder()
            .build(rateLimitKey, () -> bucketConfiguration);

        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);

        if (probe.isConsumed()) {
            addRateLimitHeaders(response, probe);
            chain.doFilter(request, response);
        } else {
            sendRateLimitExceeded(response, probe);
        }
    }

    private String extractRateLimitKey(HttpServletRequest request) {
        // Priority 1: authenticated API key
        String apiKey = request.getHeader("X-API-Key");
        if (apiKey != null) return "apikey:" + apiKey;

        // Priority 2: authenticated user from JWT
        Authentication auth = SecurityContextHolder.getContext().getAuthentication();
        if (auth instanceof JwtAuthenticationToken jwtAuth) {
            return "user:" + jwtAuth.getToken().getSubject();
        }

        // Priority 3: IP address (for unauthenticated endpoints)
        return "ip:" + extractClientIp(request);
    }

    private String extractClientIp(HttpServletRequest request) {
        String forwardedFor = request.getHeader("X-Forwarded-For");
        if (forwardedFor != null) {
            return forwardedFor.split(",")[0].trim();
        }
        return request.getRemoteAddr();
    }

    private void addRateLimitHeaders(HttpServletResponse response,
            ConsumptionProbe probe) {
        response.addHeader("X-RateLimit-Remaining",
            String.valueOf(probe.getRemainingTokens()));
        response.addHeader("X-RateLimit-Reset",
            String.valueOf(Instant.now().plusNanos(probe.getNanosToWaitForRefill())
                .getEpochSecond()));
    }

    private void sendRateLimitExceeded(HttpServletResponse response,
            ConsumptionProbe probe) throws IOException {
        long retryAfterSeconds = TimeUnit.NANOSECONDS.toSeconds(
            probe.getNanosToWaitForRefill()) + 1;

        response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
        response.setContentType(MediaType.APPLICATION_JSON_VALUE);
        response.addHeader("Retry-After", String.valueOf(retryAfterSeconds));
        response.addHeader("X-RateLimit-Remaining", "0");
        response.addHeader("X-RateLimit-Reset",
            String.valueOf(Instant.now().plusSeconds(retryAfterSeconds).getEpochSecond()));

        response.getWriter().write("""
            {"errors": [{"code": "rate_limit_exceeded",
                         "message": "Too many requests. Retry after %d seconds.",
                         "retryAfter": %d}]}
            """.formatted(retryAfterSeconds, retryAfterSeconds));
    }

    @Override
    protected boolean shouldNotFilter(HttpServletRequest request) {
        String path = request.getRequestURI();
        return path.startsWith("/actuator/") || path.startsWith("/api/health");
    }
}

proxyManager.builder().build(key, configSupplier) creates or retrieves the bucket for the given key. If the key doesn't exist in Redis, Bucket4j creates a new bucket with the provided configuration. Subsequent requests for the same key use the existing bucket — state is shared across all instances.

X-Forwarded-For parsing takes only the first IP (the client's IP) — proxies may append their own IPs to the header. The first value is the original client IP.

Per-endpoint rate limiting

Different endpoints warrant different limits. An expensive search endpoint should have tighter limits than a simple read:

@Component
public class EndpointRateLimiter {

    private final ProxyManager<String> proxyManager;

    private BucketConfiguration configFor(String endpointKey) {
        return switch (endpointKey) {
            case "search" -> BucketConfiguration.builder()
                .addLimit(Bandwidth.builder()
                    .capacity(10)
                    .refillGreedy(10, Duration.ofMinutes(1))
                    .build())
                .build();

            case "export" -> BucketConfiguration.builder()
                .addLimit(Bandwidth.builder()
                    .capacity(2)
                    .refillGreedy(2, Duration.ofHours(1))
                    .build())
                .build();

            default -> BucketConfiguration.builder()
                .addLimit(Bandwidth.builder()
                    .capacity(1000)
                    .refillGreedy(1000, Duration.ofMinutes(1))
                    .build())
                .build();
        };
    }

    public boolean isAllowed(String userId, String endpointKey) {
        String bucketKey = userId + ":" + endpointKey;
        Bucket bucket = proxyManager.builder()
            .build(bucketKey, () -> configFor(endpointKey));
        return bucket.tryConsume(1);
    }
}

Use in controllers via @PreAuthorize with a bean method:

@GetMapping("/search")
@PreAuthorize("@endpointRateLimiter.isAllowed(authentication.name, 'search')")
public SearchResponse search(@RequestParam String q,
        @AuthenticationPrincipal Jwt jwt) {
    return searchService.search(q);
}

Or as a method-level annotation for cleaner controller code:

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
@PreAuthorize("@endpointRateLimiter.isAllowed(authentication.name, 'search')")
public @interface SearchRateLimit {}

@GetMapping("/search")
@SearchRateLimit
public SearchResponse search(@RequestParam String q) { ... }

Differentiated limits by tier

Paid plans typically get higher rate limits than free tier. Store the limit tier in the JWT claims or load it from the user's subscription:

private BucketConfiguration configForUser(String userId) {
    // Load user's subscription tier — cache this to avoid per-request DB lookup
    SubscriptionTier tier = subscriptionCache.get(userId,
        () -> subscriptionRepository.findTierByUserId(userId));

    return switch (tier) {
        case FREE       -> freeConfig();      // 100/minute
        case STARTER    -> starterConfig();   // 500/minute
        case PROFESSIONAL -> proConfig();     // 2000/minute
        case ENTERPRISE -> enterpriseConfig(); // unlimited or very high
    };
}

Cache the subscription tier — loading it from the database on every request negates the performance benefit of rate limiting. Caffeine with a short TTL (5 minutes) balances consistency with performance:

@Bean
public LoadingCache<String, SubscriptionTier> subscriptionCache(
        SubscriptionRepository repo) {
    return Caffeine.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(Duration.ofMinutes(5))
        .build(userId -> repo.findTierByUserId(userId)
            .orElse(SubscriptionTier.FREE));
}

Rack-attack equivalent patterns

rack-attack provides three primitives: throttle, blocklist, and safelist. Here's the Spring Boot equivalent for each:

Blocklist — permanently block known bad actors:

@Component
public class IpBlocklistFilter extends OncePerRequestFilter {

    private final Set<String> blockedIps;  // loaded from Redis or database

    @Override
    protected void doFilterInternal(HttpServletRequest request,
            HttpServletResponse response, FilterChain chain)
            throws ServletException, IOException {
        String ip = extractClientIp(request);
        if (blockedIps.contains(ip)) {
            response.setStatus(HttpStatus.FORBIDDEN.value());
            return;
        }
        chain.doFilter(request, response);
    }
}

Safelist — skip rate limiting for trusted sources:

private boolean isSafelisted(HttpServletRequest request) {
    String ip = extractClientIp(request);
    // Internal services, monitoring, health checks
    return ip.startsWith("10.") || ip.startsWith("192.168.") ||
           "health-monitor".equals(request.getHeader("X-Client-ID"));
}

@Override
protected void doFilterInternal(...) {
    if (isSafelisted(request)) {
        chain.doFilter(request, response);
        return;
    }
    // ... rate limiting logic
}

Throttle login attempts — tighter limits on authentication:

@PostMapping("/auth/login")
public ResponseEntity<TokenResponse> login(@RequestBody LoginRequest request,
        HttpServletRequest httpRequest) {

    String ip = extractClientIp(httpRequest);
    String loginKey = "login:" + ip;

    Bucket loginBucket = proxyManager.builder()
        .build(loginKey, () -> BucketConfiguration.builder()
            .addLimit(Bandwidth.builder()
                .capacity(5)
                .refillGreedy(5, Duration.ofMinutes(15))
                .build())
            .build());

    if (!loginBucket.tryConsume(1)) {
        return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS)
            .header("Retry-After", "900")
            .body(null);
    }

    // ... authentication logic
}

5 login attempts per IP per 15 minutes. Failed attempts don't refund the token — an attacker trying a brute force attack with different passwords consumes the same budget as valid attempts.

Monitoring rate limits

Rate limit events are a security signal worth monitoring:

@Bean
public Counter rateLimitCounter(MeterRegistry registry) {
    return Counter.builder("api.rate_limit.exceeded")
        .description("Rate limit exceeded events")
        .register(registry);
}

// In the filter, when rate limit is exceeded:
rateLimitCounter.increment(
    "key_type", rateLimitKey.startsWith("apikey:") ? "api_key" :
               rateLimitKey.startsWith("user:") ? "user" : "ip",
    "endpoint", request.getRequestURI()
);

Alert on:

Sudden spike in api.rate_limit.exceeded for a specific IP — potential attack
Single API key consistently hitting limits — client bug or unauthorized sharing
Login throttle events — brute force attempt in progress

Rate limit events at the IP level that precede a pattern change (new IPs, different endpoints) are an attack signature worth investigating.

Our offices

Follow us

Spring Boot API Rate Limiting — rack-attack Equivalent in Java

What to rate limit and why

Bucket4j — the token bucket algorithm

Per-API-key rate limiting with Redis

Per-endpoint rate limiting

Differentiated limits by tier

Rack-attack equivalent patterns

Monitoring rate limits

Scale Your Backend - Need an Experienced Backend Developer?

Tell us about your project

Our offices

More articles

Spring Boot Request Processing Overhead — Filter Chains, Serialization, and What's Worth Measuring

How to Spot a Failing Software Project Before It Begins

API Documentation Is Not an Afterthought. It Is Part of the Design.

The Real Cost of a Senior Backend Developer — Full-Time vs Contractor vs Async Remote