Spring Boot API Rate Limiting — rack-attack Equivalent in Java
by Eric Hanson, Backend Developer at Clean Systems Consulting
What to rate limit and why
Rate limiting serves three distinct purposes that require different configurations:
Abuse prevention: stop a single client from consuming disproportionate resources. An API key making 10,000 requests per second is either a bug or an attack. Limit it to 1,000 per minute.
Fair usage enforcement: ensure all customers get reasonable access. One customer's heavy usage shouldn't degrade performance for others. Per-customer limits with burst allowance cover this.
Infrastructure protection: prevent the API from being overwhelmed regardless of intent. Global limits on expensive endpoints (full-text search, export, bulk operations) protect the database even when many legitimate users hit them simultaneously.
Each purpose may require different limits, different keys (IP vs API key vs user ID vs endpoint), and different responses (429 with retry-after vs graceful degradation).
Bucket4j — the token bucket algorithm
The token bucket algorithm is the standard for API rate limiting. Each client has a bucket that holds a maximum number of tokens (the burst capacity). Tokens are added at a fixed rate (the refill rate). Each request consumes one token. When the bucket is empty, the request is rejected.
This allows burst traffic (a client can use accumulated tokens quickly) while enforcing an average rate over time. A limit of "100 requests per minute with burst of 20" means a client can make 20 requests instantly, then 100 per minute thereafter.
Bucket4j implements the token bucket algorithm for Java with Redis, Hazelcast, and in-memory backends:
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-redis</artifactId>
<version>8.10.1</version>
</dependency>
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-spring-boot-starter</artifactId>
<version>8.10.1</version>
</dependency>
Per-API-key rate limiting with Redis
Redis stores the bucket state — all application instances share the same rate limit counters, enabling correct distributed rate limiting:
@Configuration
public class RateLimitConfig {
@Bean
public ProxyManager<String> proxyManager(RedissonClient redissonClient) {
return Bucket4jRedisson.casBasedBuilder(redissonClient).build();
}
@Bean
public BucketConfiguration apiKeyBucketConfiguration() {
return BucketConfiguration.builder()
.addLimit(Bandwidth.builder()
.capacity(1000) // max 1000 tokens (burst)
.refillGreedy(1000, Duration.ofMinutes(1)) // refill 1000/minute
.build())
.addLimit(Bandwidth.builder()
.capacity(100) // inner limit: 100 tokens
.refillGreedy(100, Duration.ofSeconds(10)) // refill 100 per 10s
.build())
.build();
}
}
Two bandwidth limits compose — both must pass. The outer limit (1000/minute) prevents sustained overconsumption. The inner limit (100/10 seconds) prevents burst attacks that would consume the full minute allowance in seconds.
@Component
public class RateLimitingFilter extends OncePerRequestFilter {
private final ProxyManager<String> proxyManager;
private final BucketConfiguration bucketConfiguration;
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response, FilterChain chain)
throws ServletException, IOException {
String rateLimitKey = extractRateLimitKey(request);
Bucket bucket = proxyManager.builder()
.build(rateLimitKey, () -> bucketConfiguration);
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
if (probe.isConsumed()) {
addRateLimitHeaders(response, probe);
chain.doFilter(request, response);
} else {
sendRateLimitExceeded(response, probe);
}
}
private String extractRateLimitKey(HttpServletRequest request) {
// Priority 1: authenticated API key
String apiKey = request.getHeader("X-API-Key");
if (apiKey != null) return "apikey:" + apiKey;
// Priority 2: authenticated user from JWT
Authentication auth = SecurityContextHolder.getContext().getAuthentication();
if (auth instanceof JwtAuthenticationToken jwtAuth) {
return "user:" + jwtAuth.getToken().getSubject();
}
// Priority 3: IP address (for unauthenticated endpoints)
return "ip:" + extractClientIp(request);
}
private String extractClientIp(HttpServletRequest request) {
String forwardedFor = request.getHeader("X-Forwarded-For");
if (forwardedFor != null) {
return forwardedFor.split(",")[0].trim();
}
return request.getRemoteAddr();
}
private void addRateLimitHeaders(HttpServletResponse response,
ConsumptionProbe probe) {
response.addHeader("X-RateLimit-Remaining",
String.valueOf(probe.getRemainingTokens()));
response.addHeader("X-RateLimit-Reset",
String.valueOf(Instant.now().plusNanos(probe.getNanosToWaitForRefill())
.getEpochSecond()));
}
private void sendRateLimitExceeded(HttpServletResponse response,
ConsumptionProbe probe) throws IOException {
long retryAfterSeconds = TimeUnit.NANOSECONDS.toSeconds(
probe.getNanosToWaitForRefill()) + 1;
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.setContentType(MediaType.APPLICATION_JSON_VALUE);
response.addHeader("Retry-After", String.valueOf(retryAfterSeconds));
response.addHeader("X-RateLimit-Remaining", "0");
response.addHeader("X-RateLimit-Reset",
String.valueOf(Instant.now().plusSeconds(retryAfterSeconds).getEpochSecond()));
response.getWriter().write("""
{"errors": [{"code": "rate_limit_exceeded",
"message": "Too many requests. Retry after %d seconds.",
"retryAfter": %d}]}
""".formatted(retryAfterSeconds, retryAfterSeconds));
}
@Override
protected boolean shouldNotFilter(HttpServletRequest request) {
String path = request.getRequestURI();
return path.startsWith("/actuator/") || path.startsWith("/api/health");
}
}
proxyManager.builder().build(key, configSupplier) creates or retrieves the bucket for the given key. If the key doesn't exist in Redis, Bucket4j creates a new bucket with the provided configuration. Subsequent requests for the same key use the existing bucket — state is shared across all instances.
X-Forwarded-For parsing takes only the first IP (the client's IP) — proxies may append their own IPs to the header. The first value is the original client IP.
Per-endpoint rate limiting
Different endpoints warrant different limits. An expensive search endpoint should have tighter limits than a simple read:
@Component
public class EndpointRateLimiter {
private final ProxyManager<String> proxyManager;
private BucketConfiguration configFor(String endpointKey) {
return switch (endpointKey) {
case "search" -> BucketConfiguration.builder()
.addLimit(Bandwidth.builder()
.capacity(10)
.refillGreedy(10, Duration.ofMinutes(1))
.build())
.build();
case "export" -> BucketConfiguration.builder()
.addLimit(Bandwidth.builder()
.capacity(2)
.refillGreedy(2, Duration.ofHours(1))
.build())
.build();
default -> BucketConfiguration.builder()
.addLimit(Bandwidth.builder()
.capacity(1000)
.refillGreedy(1000, Duration.ofMinutes(1))
.build())
.build();
};
}
public boolean isAllowed(String userId, String endpointKey) {
String bucketKey = userId + ":" + endpointKey;
Bucket bucket = proxyManager.builder()
.build(bucketKey, () -> configFor(endpointKey));
return bucket.tryConsume(1);
}
}
Use in controllers via @PreAuthorize with a bean method:
@GetMapping("/search")
@PreAuthorize("@endpointRateLimiter.isAllowed(authentication.name, 'search')")
public SearchResponse search(@RequestParam String q,
@AuthenticationPrincipal Jwt jwt) {
return searchService.search(q);
}
Or as a method-level annotation for cleaner controller code:
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
@PreAuthorize("@endpointRateLimiter.isAllowed(authentication.name, 'search')")
public @interface SearchRateLimit {}
@GetMapping("/search")
@SearchRateLimit
public SearchResponse search(@RequestParam String q) { ... }
Differentiated limits by tier
Paid plans typically get higher rate limits than free tier. Store the limit tier in the JWT claims or load it from the user's subscription:
private BucketConfiguration configForUser(String userId) {
// Load user's subscription tier — cache this to avoid per-request DB lookup
SubscriptionTier tier = subscriptionCache.get(userId,
() -> subscriptionRepository.findTierByUserId(userId));
return switch (tier) {
case FREE -> freeConfig(); // 100/minute
case STARTER -> starterConfig(); // 500/minute
case PROFESSIONAL -> proConfig(); // 2000/minute
case ENTERPRISE -> enterpriseConfig(); // unlimited or very high
};
}
Cache the subscription tier — loading it from the database on every request negates the performance benefit of rate limiting. Caffeine with a short TTL (5 minutes) balances consistency with performance:
@Bean
public LoadingCache<String, SubscriptionTier> subscriptionCache(
SubscriptionRepository repo) {
return Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(5))
.build(userId -> repo.findTierByUserId(userId)
.orElse(SubscriptionTier.FREE));
}
Rack-attack equivalent patterns
rack-attack provides three primitives: throttle, blocklist, and safelist. Here's the Spring Boot equivalent for each:
Blocklist — permanently block known bad actors:
@Component
public class IpBlocklistFilter extends OncePerRequestFilter {
private final Set<String> blockedIps; // loaded from Redis or database
@Override
protected void doFilterInternal(HttpServletRequest request,
HttpServletResponse response, FilterChain chain)
throws ServletException, IOException {
String ip = extractClientIp(request);
if (blockedIps.contains(ip)) {
response.setStatus(HttpStatus.FORBIDDEN.value());
return;
}
chain.doFilter(request, response);
}
}
Safelist — skip rate limiting for trusted sources:
private boolean isSafelisted(HttpServletRequest request) {
String ip = extractClientIp(request);
// Internal services, monitoring, health checks
return ip.startsWith("10.") || ip.startsWith("192.168.") ||
"health-monitor".equals(request.getHeader("X-Client-ID"));
}
@Override
protected void doFilterInternal(...) {
if (isSafelisted(request)) {
chain.doFilter(request, response);
return;
}
// ... rate limiting logic
}
Throttle login attempts — tighter limits on authentication:
@PostMapping("/auth/login")
public ResponseEntity<TokenResponse> login(@RequestBody LoginRequest request,
HttpServletRequest httpRequest) {
String ip = extractClientIp(httpRequest);
String loginKey = "login:" + ip;
Bucket loginBucket = proxyManager.builder()
.build(loginKey, () -> BucketConfiguration.builder()
.addLimit(Bandwidth.builder()
.capacity(5)
.refillGreedy(5, Duration.ofMinutes(15))
.build())
.build());
if (!loginBucket.tryConsume(1)) {
return ResponseEntity.status(HttpStatus.TOO_MANY_REQUESTS)
.header("Retry-After", "900")
.body(null);
}
// ... authentication logic
}
5 login attempts per IP per 15 minutes. Failed attempts don't refund the token — an attacker trying a brute force attack with different passwords consumes the same budget as valid attempts.
Monitoring rate limits
Rate limit events are a security signal worth monitoring:
@Bean
public Counter rateLimitCounter(MeterRegistry registry) {
return Counter.builder("api.rate_limit.exceeded")
.description("Rate limit exceeded events")
.register(registry);
}
// In the filter, when rate limit is exceeded:
rateLimitCounter.increment(
"key_type", rateLimitKey.startsWith("apikey:") ? "api_key" :
rateLimitKey.startsWith("user:") ? "user" : "ip",
"endpoint", request.getRequestURI()
);
Alert on:
- Sudden spike in
api.rate_limit.exceededfor a specific IP — potential attack - Single API key consistently hitting limits — client bug or unauthorized sharing
- Login throttle events — brute force attempt in progress
Rate limit events at the IP level that precede a pattern change (new IPs, different endpoints) are an attack signature worth investigating.