Spring Boot API Rate Limiting — rack-attack Equivalent in Java

by Eric Hanson, Backend Developer at Clean Systems Consulting

What to rate limit and why

Rate limiting serves three distinct purposes that require different configurations:

Abuse prevention: stop a single client from consuming disproportionate resources. An API key making 10,000 requests per second is either a bug or an attack. Limit it to 1,000 per minute.

Fair usage enforcement: ensure all customers get reasonable access. One customer's heavy usage shouldn't degrade performance for others. Per-customer limits with burst allowance cover this.

Infrastructure protection: prevent the API from being overwhelmed regardless of intent. Global limits on expensive endpoints (full-text search, export, bulk operations) protect the database even when many legitimate users hit them simultaneously.

Each purpose may require different limits, different keys (IP vs API key vs user ID vs endpoint), and different responses (429 with retry-after vs graceful degradation).

Bucket4j — the token bucket algorithm

The token bucket algorithm is the standard for API rate limiting. Each client has a bucket that holds a maximum number of tokens (the burst capacity). Tokens are added at a fixed rate (the refill rate). Each request consumes one token. When the bucket is empty, the request is rejected.

This allows burst traffic (a client can use accumulated tokens quickly) while enforcing an average rate over time. A limit of "100 requests per minute with burst of 20" means a client can make 20 requests instantly, then 100 per minute thereafter.

Bucket4j implements the token bucket algorithm for Java with Redis, Hazelcast, and in-memory backends:

<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-redis</artifactId>
    <version>8.10.1</version>
</dependency>
<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-spring-boot-starter</artifactId>
    <version>8.10.1</version>
</dependency>

Per-API-key rate limiting with Redis

Redis stores the bucket state — all application instances share the same rate limit counters, enabling correct distributed rate limiting:

@Configuration
public class RateLimitConfig {

    @Bean
    public ProxyManager<String> proxyManager(RedissonClient redissonClient) {
        return Bucket4jRedisson.casBasedBuilder(redissonClient).build();
    }

    @Bean
    public BucketConfiguration apiKeyBucketConfiguration() {
        return BucketConfiguration.builder()
            .addLimit(Bandwidth.builder()
                .capacity(1000)                        // max 1000 tokens (burst)
                .refillGreedy(1000, Duration.ofMinutes(1))  // refill 1000/minute
                .build())
            .addLimit(Bandwidth.builder()
                .capacity(100)                         // inner limit: 100 tokens
                .refillGreedy(100, Duration.ofSeconds(10))  // refill 100 per 10s
                .build())
            .build();
    }
}

Two bandwidth limits compose — both must pass. The outer limit (1000/minute) prevents sustained overconsumption. The inner limit (100/10 seconds) prevents burst attacks that would consume the full minute allowance in seconds.

@Component
public class RateLimitingFilter extends OncePerRequestFilter {

    private final ProxyManager<String> proxyManager;
    private final BucketConfiguration bucketConfiguration;

    @Override
    protected void doFilterInternal(HttpServletRequest request,
            HttpServletResponse response, FilterChain chain)
            throws ServletException, IOException {

        String rateLimitKey = extractRateLimitKey(request);
        Bucket bucket = proxyManager.builder()
            .build(rateLimitKey, () -> bucketConfiguration);

        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);

        if (probe.isConsumed()) {
            addRateLimitHeaders(response, probe);
            chain.doFilter(request, response);
        } else {
            sendRateLimitExceeded(response, probe);
        }
    }

    private String extractRateLimitKey(HttpServletRequest request) {
        // Priority 1: authenticated API key
        String apiKey = request.getHeader("X-API-Key");
        if (apiKey != null) return "apikey:" + apiKey;

        // Priority 2: authenticated user from JWT
        Authentication auth = SecurityContextHolder.getContext().getAuthentication();
        if (auth instanceof JwtAuthenticationToken jwtAuth) {
            return "user:" + jwtAuth.getToken().getSubject();
        }

        // Priority 3: IP address (for unauthenticated endpoints)
        return "ip:" + extractClientIp(request);
    }

    private String extractClientIp(HttpServletRequest request) {
        String forwardedFor = request.getHeader("X-Forwarded-For");
        if (forwardedFor != null) {
            return forwardedFor.split(",")[0].trim();
        }
        return request.getRemoteAddr();
    }

    private void addRateLimitHeaders(HttpServletResponse response,
            ConsumptionProbe probe) {
        response.addHeader("X-RateLimit-Remaining",
            String.valueOf(probe.getRemainingTokens()));
        response.addHeader("X-RateLimit-Reset",
            String.valueOf(Instant.now().plusNanos(probe.getNanosToWaitForRefill())
                .getEpochSecond()));
    }

    private void sendRateLimitExceeded(HttpServletResponse response,
            ConsumptionProbe probe) throws IOException {
        long retryAfterSeconds = TimeUnit.NANOSECONDS.toSeconds(
            probe.getNanosToWaitForRefill()) + 1;

        response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
        response.setContentType(MediaType.APPLICATION_JSON_VALUE);
        response.addHeader("Retry-After", String.valueOf(retryAfterSeconds));
        response.addHeader("X-RateLimit-Remaining", "0");
        response.addHeader("X-RateLimit-Reset",
            String.valueOf(Instant.now().plusSeconds(retryAfterSeconds).getEpochSecond()));

        response.getWriter().write("""
            {"errors": [{"code": "rate_limit_exceeded",
                         "message": "Too many requests. Retry after %d seconds.",
                         "retryAfter": %d}]}
            """.formatted(retryAfterSeconds, retryAfterSeconds));
    }

    @Override
    protected boolean shouldNotFilter(HttpServletRequest request) {
        String path = request.getRequestURI();
        return path.startsWith("/actuator/") || path.startsWith("/api/health");
    }
}

proxyManager.builder().build(key, configSupplier) creates or retrieves the bucket for the given key. If the key doesn't exist in Redis, Bucket4j creates a new bucket with the provided configuration. Subsequent requests for the same key use the existing bucket — state is shared across all instances.

X-Forwarded-For parsing takes only the first IP (the client's IP) — proxies may append their own IPs to the header. The first value is the original client IP.

Per-endpoint rate limiting

Different endpoints warrant different limits. An expensive search endpoint should have tighter limits than a simple read:

@Component
public class EndpointRateLimiter {

    private final ProxyManager<String> proxyManager;

    private BucketConfiguration configFor(String endpointKey) {
        return switch (endpointKey) {
            case "search" -> BucketConfiguration.builder()
                .addLimit(Bandwidth.builder()
                    .capacity(10)
                    .refillGreedy(10, Duration.ofMinutes(1))
                    .build())
                .build();

            case "export" -> BucketConfiguration.builder()
                .addLimit(Bandwidth.builder()
                    .capacity(2)
                    .refillGreedy(2, Duration.ofHours(1))
                    .build())
                .build();

            default -> BucketConfiguration.builder()
                .addLimit(Bandwidth.builder()
                    .capacity(1000)
                    .refillGreedy(1000, Duration.ofMinutes(1))
                    .build())
                .build();
        };
    }

    public boolean isAllowed(String userId, String endpointKey) {
        String bucketKey = userId + ":" + endpointKey;
        Bucket bucket = proxyManager.builder()
            .build(bucketKey, () -> configFor(endpointKey));
        return bucket.tryConsume(1);
    }
}

Use in controllers via @PreAuthorize with a bean method:

@GetMapping("/search")
@PreAuthorize("@endpointRateLimiter.isAllowed(authentication.name, 'search')")
public SearchResponse search(@RequestParam String q,
        @AuthenticationPrincipal Jwt jwt) {
    return searchService.search(q);
}

Or as a method-level annotation for cleaner controller code:

@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
@PreAuthorize("@endpointRateLimiter.isAllowed(authentication.name, 'search')")
public @interface SearchRateLimit {}

@GetMapping("/search")
@SearchRateLimit
public SearchResponse search(@RequestParam String q) { ... }

Differentiated limits by tier

Paid plans typically get higher rate limits than free tier. Store the limit tier in the JWT claims or load it from the user's subscription:

private BucketConfiguration configForUser(String userId) {
    // Load user's subscription tier — cache this to avoid per-request DB lookup
    SubscriptionTier tier = subscriptionCache.get(userId,
        () -> subscriptionRepository.findTierByUserId(userId));

    return switch (tier) {
        case FREE       -> freeConfig();      // 100/minute
        case STARTER    -> starterConfig();   // 500/minute
        case PROFESSIONAL -> proConfig();     // 2000/minute
        case ENTERPRISE -> enterpriseConfig(); // unlimited or very high
    };
}

Cache the subscription tier — loading it from the database on every request negates the performance benefit of rate limiting. Caffeine with a short TTL (5 minutes) balances consistency with performance:

@Bean
public LoadingCache<String, SubscriptionTier> subscriptionCache(
        SubscriptionRepository repo) {
    return Caffeine.newBuilder()
        .maximumSize(10_000)
        .expireAfterWrite(Duration.ofMinutes(5))
        .build(userId -> repo.findTierByUserId(userId)
            .orElse(SubscriptionTier.FREE));
}

Rack-attack equivalent patterns

rack-attack provides three primitives: throttle, blocklist, and safelist. Here's the Spring Boot equivalent for each:

Blocklist — permanently block known bad actors:

@Component
public class IpBlocklistFilter extends OncePerRequestFilter {

    private final Set<String> blockedIps;  // loaded from Redis or database

    @Override
    protected void doFilterInternal(HttpServletRequest request,
            HttpServletResponse response, FilterChain chain)
            throws ServletException, IOException {
        String ip = extractClientIp(request);
        if (blockedIps.contains(ip)) {
            response.setStatus(HttpStatus.FORBIDDEN.value());
            return;
        }
        chain.doFilter(request, response);
    }
}

Safelist — skip rate limiting for trusted sources:

private boolean isSafelisted(HttpServletRequest request) {
    String ip = extractClientIp(request);
    // Internal services, monitoring, health checks
    return ip.startsWith("10.") || ip.startsWith("192.168.") ||
           "health-monitor".equals(request.getHeader("X-Client-ID"));
}

@Override
protected void doFilterInternal(...) {
    if (isSafelisted(request)) {
        chain.doFilter(request, response);
        return;
    }
    // ... rate limiting logic
}

Throttle login attempts — tighter limits on authentication:

@PostMapping("/auth/login")
public ResponseEntity<TokenResponse> login(@RequestBody LoginRequest request,
        HttpServletRequest httpRequest) {

    String ip = extractClientIp(httpRequest);
    String loginKey = "login:" + ip;

    Bucket loginBucket = proxyManager.builder()
        .build(loginKey, () -> BucketConfiguration.builder()
            .addLimit(Bandwidth.builder()
                .capacity(5)
                .refillGreedy(5, Duration.ofMinutes(15))
                .build())
            .build());

    if (!loginBucket.tryConsume(1)) {
        return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS)
            .header("Retry-After", "900")
            .body(null);
    }

    // ... authentication logic
}

5 login attempts per IP per 15 minutes. Failed attempts don't refund the token — an attacker trying a brute force attack with different passwords consumes the same budget as valid attempts.

Monitoring rate limits

Rate limit events are a security signal worth monitoring:

@Bean
public Counter rateLimitCounter(MeterRegistry registry) {
    return Counter.builder("api.rate_limit.exceeded")
        .description("Rate limit exceeded events")
        .register(registry);
}

// In the filter, when rate limit is exceeded:
rateLimitCounter.increment(
    "key_type", rateLimitKey.startsWith("apikey:") ? "api_key" :
               rateLimitKey.startsWith("user:") ? "user" : "ip",
    "endpoint", request.getRequestURI()
);

Alert on:

  • Sudden spike in api.rate_limit.exceeded for a specific IP — potential attack
  • Single API key consistently hitting limits — client bug or unauthorized sharing
  • Login throttle events — brute force attempt in progress

Rate limit events at the IP level that precede a pattern change (new IPs, different endpoints) are an attack signature worth investigating.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

When One Developer Knows Everything About the System

It feels reassuring to have one person who understands everything. Until you realize that person has quietly become your biggest bottleneck.

Read more

How Good Engineering Teams Use Code Review

Code reviews aren’t just a formality—they’re the secret sauce that separates good engineering teams from the rest. Done right, they improve code, knowledge, and culture.

Read more

Networking Strategies for Remote Consultants

Working remotely is great—until you realize you’re missing the casual hallway chats and spontaneous connections that build opportunities.

Read more

The Developer Who Cuts Corners to Look Fast

Speed looks impressive—until the shortcuts catch up with you. Cutting corners may make a developer look fast today, but it costs the team tomorrow.

Read more