My Tech Blog

Rate limiting is one of those topics that looks simple on the surface — "just count requests" — until you start asking harder questions. How do you handle bursts? How do you avoid boundary exploits? How do you keep the check atomic when multiple processes hit Redis at the same time? I wanted to build a proper demo that answers all of these, so I implemented all five mainstream algorithms in a single FastAPI service backed by Redis, switchable via a config file.

The Setup

The project is a FastAPI app with a global rate-limit middleware. On startup it reads a YAML config, picks an algorithm, and constructs a limiter instance backed by a shared Redis connection. Every request passes through the middleware, which calls limiter.is_allowed("global") and either lets it through or returns a 429 with X-RateLimit-* headers.

rate-limiter-demo/
├── pyproject.toml
├── rate_limit_config.yaml     ← swap algorithm + params here
├── config.py                  ← Pydantic models for the YAML
├── middleware.py              ← FastAPI middleware
├── main.py                    ← startup, routes, /debug/state
└── limiter/
    ├── base.py                ← RateLimiter ABC + LimitResult dataclass
    ├── factory.py             ← create_limiter(rule, redis)
    ├── token_bucket.py
    ├── leaking_bucket.py
    ├── fixed_window.py
    ├── sliding_window_log.py
    └── sliding_window_counter.py

The config file controls everything. To switch algorithms you change one line and restart:

rules:
  global:
    algorithm: token_bucket   # ← change this

    token_bucket:
      capacity: 10
      refill_rate: 5          # tokens per second

    leaking_bucket:
      queue_capacity: 10
      leak_rate: 5

    fixed_window:
      max_requests: 10
      window_size: 60

    sliding_window_log:
      max_requests: 10
      window_size: 60

    sliding_window_counter:
      max_requests: 10
      window_size: 60

All five algorithm blocks live in the file simultaneously — only the active one is used. This makes it easy to compare behaviour without juggling separate configs.

The Five Algorithms

1. Token Bucket

The most flexible algorithm. A bucket holds up to capacity tokens, refilled at refill_rate tokens per second. Each request consumes one token. If the bucket is empty, the request is rejected.

The key insight is that you don't need a background job to refill tokens — you compute the refill lazily on each request by multiplying elapsed time by the refill rate.

Redis storage: a single Hash at rl:global:tb

tokens       9.4          ← current token count (float)
last_refill  1743050412   ← unix timestamp of last calculation

This is the only algorithm that genuinely allows bursts: if no requests come in for two seconds, you get two seconds' worth of tokens back, up to capacity.

2. Leaking Bucket

Think of it as a FIFO queue with a hole in the bottom. Requests fill the queue; the queue drains at a constant leak_rate regardless of input. If the queue is full, the request is dropped.

The effect is the opposite of Token Bucket: bursts are absorbed up to queue capacity, but the output rate is smooth and constant. This models what happens when you have a downstream system that can only handle a fixed throughput.

Redis storage: a single Hash at rl:global:lb

queue_size   3.2          ← current virtual queue occupancy (float)
last_leak    1743050412   ← unix timestamp of last drain calculation

Like Token Bucket, the drain is computed lazily — no background worker needed.

3. Fixed Window

The simplest approach. Divide time into fixed-size windows (e.g. every 60 seconds from epoch). Maintain a counter per window. Allow up to max_requests per window; reject the rest.

Redis storage: a String counter at rl:global:fw:<window_id>

rl:global:fw:29050840  →  "7"    (TTL: 120s)

window_id = floor(now / window_size). Old windows expire automatically via Redis TTL, so there's nothing to clean up.

The well-known weakness is the boundary burst: a client can fire max_requests at 11:59:59 and another max_requests at 12:00:00, effectively doubling the rate over a two-second window. The sliding window algorithms below fix this.

4. Sliding Window Log

The most precise algorithm. Every request is logged as an entry in a Redis Sorted Set, with the timestamp as the score. To check a new request: remove all entries older than now - window_size, count what remains, and allow if below max_requests.

Redis storage: a Sorted Set at rl:global:swl

member                           score (timestamp)
"1743050412.3-a3f9..."           1743050412.3
"1743050398.1-b7c2..."           1743050398.1
...

The member value includes a UUID suffix so two requests arriving at the exact same millisecond don't collide (Sorted Set members must be unique; scores don't have to be).

This is perfectly accurate — no boundary artifacts — but it's the most memory-intensive option because it stores a record for every request in the window.

5. Sliding Window Counter

A clever approximation that gets most of the accuracy of Sliding Window Log at a fraction of the memory cost. It uses two adjacent fixed-window counters and weights the previous one:

estimated = prev_count × (1 - elapsed_in_current_window / window_size)
          + current_count

If you're 30 seconds into a 60-second window, you assume the previous window's traffic was uniformly distributed and count 50% of it.

Redis storage: two String counters

rl:global:swc:29050840   →  "5"    ← current window  (TTL: 120s)
rl:global:swc:29050839   →  "8"    ← previous window (TTL: 120s)

The approximation breaks down for very spiky traffic (the uniform-distribution assumption fails), but for most real-world workloads it's indistinguishable from the log-based approach.

The Atomicity Problem — and Why Lua Wins

Three of the five algorithms (Token Bucket, Leaking Bucket, Sliding Window Log) require a read → compute → write sequence that must be atomic. Token Bucket is the clearest example:

1. Read current tokens from Redis
2. Compute how many tokens have refilled since last_refill
3. Consume one token if available
4. Write the new token count back

Without atomicity, two concurrent requests can both read tokens = 1, both decide "yes, allowed", and both write tokens = 0 — a double-spend.

I implemented the critical section as a Lua script registered with Redis:

_LUA_TOKEN_BUCKET = """
local key      = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate     = tonumber(ARGV[2])
local now      = tonumber(ARGV[3])

local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens      = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now

local elapsed  = now - last_refill
local refilled = math.min(capacity, tokens + elapsed * rate)

local allowed = 0
if refilled >= 1 then
    refilled = refilled - 1
    allowed  = 1
end

redis.call('HSET', key, 'tokens', refilled, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / rate) * 10)

return {allowed, math.floor(refilled)}
"""

Redis is single-threaded and runs Lua scripts without interruption, so the entire read-compute-write happens as one indivisible operation — no lock needed, no round-trip overhead.

Why not pure Python?

You could do this in Python using Redis's WATCH command (optimistic locking):

async with redis.pipeline() as pipe:
    while True:
        try:
            await pipe.watch(key)
            data = await pipe.hgetall(key)
            # ... compute refill in Python ...
            pipe.multi()
            pipe.hset(key, mapping={"tokens": new_tokens, "last_refill": now})
            await pipe.execute()
            break
        except WatchError:
            continue  # key changed between WATCH and EXECUTE — retry

Or use a distributed lock (redis.lock(...)). Both work, but both cost more:

	Lua	Python + WATCH	Python + lock
Round trips	1	2+ (retry on conflict)	3+
Latency	lowest	higher under contention	highest
Failure modes	none	retry storm	deadlock on crash

For a demo the WATCH approach is fine. For production, Lua's single round-trip and zero retry overhead make it the clear winner.

Note that Fixed Window and Sliding Window Counter don't need Lua. INCR is already atomic in Redis, and Sliding Window Counter's reads are advisory — a slight overcount is acceptable since it's already an approximation.

Inspecting Redis State

One of the more useful additions was a /debug/state endpoint that shows the raw Redis keys for the active algorithm:

curl http://localhost:8000/debug/state

{
  "algorithm": "token_bucket",
  "redis_keys": {
    "rl:global:tb": {
      "tokens": "8.200000",
      "last_refill": "1743050412.341"
    }
  }
}

Switch to sliding_window_log and you see the Sorted Set members. Switch to sliding_window_counter and you see two integer keys. This made it easy to verify that each algorithm was actually writing what I expected to Redis.

There's also a POST /debug/reset that deletes all rl:global* keys — handy for resetting state between experiments without restarting the server.

What's Next

This demo covers global rate limiting only. The natural next step is per-user or per-IP limiting, which just means replacing "global" with a user ID or IP extracted from the request. The limiter code doesn't change — only the identifier passed to is_allowed().

Beyond that: rate limiting in a distributed system (multiple app servers, one Redis) is exactly where this design shines, since all state lives in Redis and every server sees the same counters. The Lua scripts remain correct under concurrent load from any number of processes.

If you want to run it yourself:

# requires Redis on localhost:6379
cd rate-limiter-demo
uv run python main.py

# hit the protected endpoint
curl -i http://localhost:8000/api/resource

# inspect raw Redis state
curl http://localhost:8000/debug/state

# reset and try a different algorithm
curl -X POST http://localhost:8000/debug/reset
# edit rate_limit_config.yaml, restart

The full source is in the rate-limiter-demo folder of my system-design demos repo.