Rate Limiting Is Not Just for Big Companies

by Eric Hanson, Backend Developer at Clean Systems Consulting

The Incident That Didn't Need to Happen

A developer at a partner company had a bug in their integration code. A loop that was supposed to call your API once per minute was calling it every 10 milliseconds. By the time anyone noticed, they'd sent 50,000 requests in eight minutes. Your service was handling ~200 req/sec instead of its normal 5. Database connection pool exhausted. Every other customer experiencing errors. Incident postmortem: "add rate limiting."

This is not a story about malicious actors or billion-dollar scale. It's a story about a mundane bug in client code that took down a service with no defenses against overload. Rate limiting would have contained the damage to the misbehaving client. Everyone else would have been unaffected.

What Rate Limiting Actually Protects Against

Client bugs: As above. Loops with missing backoff, retries without jitter, misconfigured polling intervals — these happen routinely in integration development. A client that accidentally calls your API at 1000x its intended rate should not be able to take down your service.

Runaway automation: A script someone wrote to bulk-process data, run without throttling, can easily generate legitimate-looking requests at volumes that exceed your capacity.

Intentional abuse: API scraping, credential stuffing (trying username/password combinations at scale), account enumeration. Rate limiting is not a complete defense against these, but it raises the cost significantly.

Cascading failures: When a downstream dependency is slow, retry logic can multiply request volume. Rate limiting at the entry point caps how much retry traffic can hit the system, preventing the retry storm from compounding the degradation.

The Algorithms

Fixed window: Count requests in a time bucket (e.g., per minute). Simple to implement and reason about. Has a boundary vulnerability: a client can send N requests at 11:59:59 and N more at 12:00:01 — 2N requests in two seconds while staying within the per-minute limit.

Sliding window log: Keep a log of timestamps for each client's requests. Count requests in the rolling window. Accurate but memory-intensive at scale.

Sliding window counter: Approximate the sliding window using two adjacent fixed windows. Accurate to within a small error margin, memory-efficient. This is the approach used by many production rate limiters.

Token bucket: A bucket fills at a constant rate up to a maximum capacity. Each request consumes one token. If the bucket is empty, the request is rejected. Allows bursting up to the bucket capacity while enforcing an average rate over time. This is the model used by AWS API Gateway, Stripe, and most major APIs.

Leaky bucket: Requests enter a queue at any rate; they exit the queue at a fixed rate. Smooths traffic bursts into a constant output rate. Appropriate for scenarios where you need consistent output throughput, not just input rate limiting.

Implementation: Where Rate Limiting Lives

At the application layer: Frameworks like Resilience4j (JVM) and rate-limiters in Express, FastAPI, etc. provide in-process rate limiting. The limitation: state lives per instance. In a horizontally scaled service, a client can bypass per-instance limits by hitting multiple instances. Requires a shared backend (Redis) for accurate enforcement across instances.

RateLimiter rateLimiter = RateLimiter.of("api-calls",
    RateLimiterConfig.custom()
        .limitRefreshPeriod(Duration.ofSeconds(1))
        .limitForPeriod(100)  // 100 requests per second
        .timeoutDuration(Duration.ZERO) // fail immediately if limit exceeded
        .build());

if (!rateLimiter.acquirePermission()) {
    throw new RateLimitExceededException("Rate limit exceeded");
}

At the API gateway layer: AWS API Gateway, Kong, nginx with the limit_req module, Envoy — these handle rate limiting before requests reach your application. Ideal for per-client or per-endpoint limits. State is managed by the gateway infrastructure. This is the lowest-overhead option for simple cases.

Distributed rate limiting with Redis: For accurate cross-instance rate limiting in application code, use Redis with atomic operations. The Lua script pattern or Redis modules like redis-cell implement token bucket in Redis atomically.

The Response That Matters

When a request is rate limited, respond with HTTP 429 (Too Many Requests) and include:

  • Retry-After: 60 (seconds until the client may retry)
  • X-RateLimit-Limit: 100 (the limit)
  • X-RateLimit-Remaining: 0 (remaining in current window)
  • X-RateLimit-Reset: 1714000000 (Unix timestamp when the window resets)

A client that receives a 429 with proper headers can implement correct backoff. A client that receives a 500 with no guidance will retry immediately, compounding the problem.

What to Limit and How to Scope It

Per-client (API key or IP address) rate limiting is the baseline. Consider also:

  • Per-endpoint limits for expensive operations (report generation, bulk exports)
  • Global limits on anonymous traffic to protect against scraping
  • Differentiated limits by subscription tier if you have paying customers with different entitlements

The Practical Takeaway

If your API is in production without rate limiting, add it this week — not this quarter. Start with per-IP and per-API-key limits at your API gateway layer. Choose limits based on your current p99 capacity with headroom: if you handle 1,000 req/sec today and want to protect against overload, a per-client limit of 100 req/sec leaves room for 10 well-behaved clients before you're at capacity. Instrument 429 responses in your metrics dashboard so rate-limiting events are visible.

Scale Your Backend - Need an Experienced Backend Developer?

We provide backend engineers who join your team as contractors to help build, improve, and scale your backend systems.

We focus on clean backend design, clear documentation, and systems that remain reliable as products grow. Our goal is to strengthen your team and deliver backend systems that are easy to operate and maintain.

We work from our own development environments and support teams across US, EU, and APAC timezones. Our workflow emphasizes documentation and asynchronous collaboration to keep development efficient and focused.

  • Production Backend Experience. Experience building and maintaining backend systems, APIs, and databases used in production.
  • Scalable Architecture. Design backend systems that stay reliable as your product and traffic grow.
  • Contractor Friendly. Flexible engagement for short projects, long-term support, or extra help during releases.
  • Focus on Backend Reliability. Improve API performance, database stability, and overall backend reliability.
  • Documentation-Driven Development. Development guided by clear documentation so teams stay aligned and work efficiently.
  • Domain-Driven Design. Design backend systems around real business processes and product needs.

Tell us about your project

Our offices

  • Copenhagen
    1 Carlsberg Gate
    1260, København, Denmark
  • Magelang
    12 Jalan Bligo
    56485, Magelang, Indonesia

More articles

What to Do When Unit Tests Pass but Production Hates You

Everything works on your machine, all unit tests green. Then production screams, and suddenly you’re the villain

Read more

Spring Boot Caching in Practice — @Cacheable, Cache Warming, and When Caching Makes Things Worse

Spring Boot's caching abstraction makes it easy to add caching to any method. What it doesn't tell you is when caching the wrong things causes stale data bugs, cache stampedes, and memory pressure that's harder to debug than the original performance problem.

Read more

How Remote Teams Manage Projects Without Chaos

Managing projects remotely can feel like herding cats. With the right approach, teams stay organized, aligned, and stress-free.

Read more

Stop Writing "Fixed Bug" as Your Commit Message

A commit message that says "fixed bug" is worse than no message at all — it creates false confidence that the history is documented while giving future developers nothing to work with.

Read more