I/D/E · Patterns

Token Bucket

Summary

A rate limiting algorithm that allows controlled bursts of traffic while enforcing an average rate limit over time

TL;DR

Token bucket is a rate limiting algorithm where tokens accumulate in a bucket at a fixed rate up to a maximum capacity. Each request consumes a token. If tokens are available, the request proceeds; if not, it’s rejected. This allows bursts up to bucket capacity while enforcing an average rate equal to the refill rate.

Visual Overview

Token Bucket Algorithm
TOKEN BUCKET MECHANICS

                                                    
  Bucket State: [●●●●●] (5/5 tokens)               
                                                    
  Parameters:                                       
  - Capacity: 5 tokens (max burst)                  
  - Refill: 1 token/second (average rate)           
                                                    
  Request 1: [●●●●○]  ALLOWED (4 remaining)        
  Request 2: [●●●○○]  ALLOWED (3 remaining)        
  Request 3: [●●○○○]  ALLOWED (2 remaining)        
  Request 4: [●○○○○]  ALLOWED (1 remaining)        
  Request 5: [○○○○○]  ALLOWED (0 remaining)        
  Request 6: [○○○○○]  REJECTED (no tokens!)        
                                                    
  After 3 seconds: [●●●○○] (3 tokens refilled)      
                                                    


TWO KEY PARAMETERS

                                                    
  CAPACITY (bucket size)                            
   Maximum burst size                             
   How many requests can fire instantly           
                                                    
  REFILL RATE (tokens per second)                   
   Steady-state throughput                        
   Average rate over time                         
                                                    
  Example: capacity=100, refill=10/sec              
  - Burst: 100 requests instantly                   
  - Sustained: 10 requests/second average           
                                                    

Core Explanation

What is Token Bucket?

Real-World Analogy: Think of a parking garage with a limited number of spaces. Cars (requests) arrive and need a parking spot (token). The garage has 50 spots (capacity). Cars leave at a steady rate of 1 per minute (refill rate).

  • If spots are available, cars park immediately
  • If the garage is full, cars must wait or go elsewhere
  • After rush hour, spots gradually become available again
  • A burst of arrivals is fine—up to 50 cars—but then they must slow down

This is why token bucket is perfect for APIs: it allows legitimate burst traffic (page loads, app startup) while preventing sustained overload.

How It Works

The algorithm maintains two pieces of state per rate-limited key:

  1. tokens: Current number of available tokens (float)
  2. last_refill: Timestamp of last refill calculation
Token Bucket State Machine

                                                    
  On Request Arrival:                               
     
   1. Calculate elapsed time since last_refill    
   2. Add tokens: elapsed × refill_rate           
   3. Cap at capacity (no overflow)               
   4. Update last_refill = now                    
     
                                                   
     
   Check: tokens >= 1?                            
       YES  tokens -= 1, return ALLOW            
       NO   return REJECT (429 Too Many)         
     
                                                    
  State per key: ~16 bytes (float + timestamp)      
  Time complexity: O(1) per request                 
                                                    

The Algorithm

on request:
  now = current_time()
  elapsed = now - last_refill
  tokens_to_add = elapsed * refill_rate
  tokens = min(capacity, tokens + tokens_to_add)
  last_refill = now

  if tokens >= 1:
    tokens -= 1
    return ALLOW
  else:
    return REJECT

Key Properties

PropertyValueNotes
Time ComplexityO(1) per requestSimple arithmetic
Space ComplexityO(1) per key~16 bytes (tokens + timestamp)
Burst ToleranceUp to capacityAllows full burst immediately
Average Raterefill_rateLong-term steady state

Token Bucket vs Leaky Bucket

Token Bucket vs Leaky Bucket
TOKEN BUCKET

  Requests  Check Tokens  Allow/Reject            
                                                    
  [●●●●●] full bucket                              
   (5 rapid requests)                         
  [○○○○○] empty  REJECT until refill              
                                                    
   Allows bursts up to capacity                    
   Good for bursty but legitimate traffic          
   Simple to implement                             


LEAKY BUCKET

  Requests  Queue  Drain at fixed rate            
                                                    
   (5 rapid requests)                         
  [●●●●●] queue fills                              
      (drains one at a time at fixed rate)        
out  ●out  ●out...                         
                                                    
   Smooths output to constant rate                 
   Good for shaping traffic to downstream          
  ✕ Adds latency (queueing)                         


KEY DIFFERENCE

  Token Bucket: "Can you handle this burst?"        
     Yes/No immediately (no queueing)              
                                                    
  Leaky Bucket: "Process requests at steady rate"   
     Queues bursts, drains smoothly                
                                                    
  Use Token Bucket for: API rate limiting           
  Use Leaky Bucket for: Network traffic shaping     

Real Systems Using Token Bucket

SystemImplementationConfigurationUse Case
AWS API GatewayNative token bucketConfigurable per stageAPI management
NGINXngx_http_limit_req_moduleburst parameter controls capacityWeb server rate limiting
Envoy ProxyLocal/global rate limitingConfigurable via xDSService mesh
Google Cloud EndpointsToken bucket styleQuota configurationAPI quotas
Kong GatewayRate limiting pluginConfigurable policyAPI gateway

Note: Implementations may vary slightly. Verify current behavior in official documentation.

Configuration Patterns

Common Token Bucket Configurations
API ENDPOINT PATTERNS (Illustrative)

  Endpoint Type      Capacity  Refill     Notes 

  Standard API       100       10/sec     Burst 
  Expensive Query    10        1/sec      Heavy 
  Login/Auth         5         1/min      Slow  
  Webhook Receiver   500       100/sec    Spiky 


TIERED RATE LIMITS

  Tier        Bucket Size  Refill Rate  Monthly 

  Free        Small        Low          Trial   
  Developer   Medium       Medium       Basic   
  Business    Large        High         Standard
  Enterprise  Custom       Custom       Custom  

(Exact limits vary by provider—consult documentation)

When to Use Token Bucket

✓ Perfect Use Cases

Token Bucket Use Cases
PUBLIC APIS WITH BURSTY CLIENTS
Scenario: REST API called by mobile apps
Requirement: Allow burst on app open, enforce average rate
Configuration: capacity=50, refill=10/sec
Trade-off: Allows short bursts, blocks sustained overload

CDN/EDGE RATE LIMITING
Scenario: Edge nodes rate-limit per IP
Requirement: Fast O(1) check, minimal state
Configuration: capacity=100, refill=20/sec per IP
Trade-off: Larger capacity = more tolerance for legitimate spikes

TIERED API ACCESS
Scenario: Free vs paid API tiers
Requirement: Different burst/rate per subscription
Configuration: Free(10,1), Pro(100,10), Enterprise(1000,100)
Trade-off: Need to track tier per API key

INTERNAL SERVICE PROTECTION
Scenario: Database connection rate limiting
Requirement: Prevent connection storms
Configuration: capacity=20, refill=5/sec per service
Trade-off: May queue legitimate requests during spikes

✕ When NOT to Use

When Token Bucket May Not Fit
STRICT BILLING/QUOTA ENFORCEMENT
Problem: Token bucket allows bursts that may exceed quota
Example: "1000 requests/hour"  user burns 100 instantly
Alternative: Sliding window log for exact counting
         (counter variant is also approximate)
When OK: If slight overage is acceptable

SMOOTH OUTPUT REQUIRED
Problem: Token bucket doesn't smooth output rate
Example: Database can handle 10 req/sec, not 100 in 1 second
Alternative: Leaky bucket (queues and drains steadily)
When OK: If downstream handles bursts fine

GLOBAL DISTRIBUTED RATE LIMITING
Problem: Local buckets can allow N × limit globally
Example: 10 regions × 100 burst = 1000 concurrent burst
Alternative: Centralized limiter or eventual sync
When OK: If approximate limiting is acceptable

Interview Application

Common Interview Question

Q: “How would you implement rate limiting for a public API? Walk me through token bucket and when you’d choose it over alternatives.”

Strong Answer:

“I’d implement rate limiting using a token bucket algorithm for most API use cases. Here’s my reasoning:

Why Token Bucket:

  1. Burst tolerance: Real clients are bursty—mobile apps make 10 requests on launch, then go quiet. Token bucket handles this gracefully.
  2. O(1) per request: Just arithmetic—no scanning, no sorting. Critical for high-throughput APIs.
  3. Simple state: Two values per key (tokens, timestamp). Easy to store in Redis.

The Algorithm:

  • Each request checks: do I have tokens? If yes, consume one and proceed. If no, reject with 429.
  • Tokens refill at a steady rate (e.g., 10/second) up to capacity (e.g., 100).
  • Capacity controls burst size; refill rate controls sustained throughput.

Distributed Implementation:

  • Store {tokens, last_refill} in Redis per API key
  • Use Lua script for atomic check-and-decrement
  • Handle Redis failures by failing open (allow request)—better to risk some overload than block everyone

When I’d choose alternatives:

  • Sliding window: When I need exact counting for billing (no burst overage)
  • Leaky bucket: When I need to smooth output to a downstream service
  • Fixed window: Never—boundary burst problem makes it unreliable

Headers I’d return:

  • X-RateLimit-Limit: Bucket capacity
  • X-RateLimit-Remaining: Current tokens
  • X-RateLimit-Reset: When bucket refills to capacity
  • Retry-After: Seconds until tokens available (on 429)”

Follow-up: What’s the boundary burst problem and how does token bucket handle it?

“The boundary burst problem affects fixed window rate limiting. If the window resets at the top of each minute, a client can make 100 requests at 0:59 and 100 more at 1:00—effectively 200 requests in 2 seconds while technically staying within ‘100/minute’.

Token bucket avoids this because it doesn’t have discrete windows. Tokens accumulate continuously. If you burn all tokens at 0:59, you don’t magically get more at 1:00—you wait for refill. The maximum burst is always capped at capacity, regardless of timing.”

Follow-up: How do you handle token bucket in a multi-region deployment?

“There are three approaches with different trade-offs:

  1. Sticky routing: Route each user to one region. Simple but adds latency for users far from their assigned region.

  2. Synchronized global state: Single Redis cluster, all regions check it. Accurate but adds cross-region latency to every request.

  3. Local with eventual sync: Each region has local bucket, periodically sync totals. Fast but user might get N × limit globally.

For most APIs, I’d choose option 3—slightly exceeding limits is acceptable compared to adding 100ms+ latency to every request. For billing-critical endpoints, option 2 or sticky routing.”

Code Example

Token Bucket Rate Limiter (Python)

import time
from dataclasses import dataclass
from typing import Tuple, Dict

@dataclass
class BucketState:
    """State for a single rate-limited key."""
    tokens: float
    last_refill: float

class TokenBucketRateLimiter:
    """
    In-memory token bucket rate limiter.

    For production, replace self.buckets with Redis storage.
    """

    def __init__(self, capacity: int, refill_rate: float):
        """
        Args:
            capacity: Maximum tokens (burst size)
            refill_rate: Tokens added per second
        """
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.buckets: Dict[str, BucketState] = {}

    def is_allowed(self, key: str) -> Tuple[bool, dict]:
        """
        Check if request is allowed and consume a token.

        Args:
            key: Rate limit key (user_id, ip, api_key)

        Returns:
            (allowed: bool, info: dict with remaining, reset_at)
        """
        now = time.time()

        # Get or create bucket state
        if key not in self.buckets:
            self.buckets[key] = BucketState(
                tokens=self.capacity,
                last_refill=now
            )

        bucket = self.buckets[key]

        # Calculate tokens to add since last refill
        elapsed = now - bucket.last_refill
        tokens_to_add = elapsed * self.refill_rate

        # Refill bucket (cap at capacity)
        bucket.tokens = min(self.capacity, bucket.tokens + tokens_to_add)
        bucket.last_refill = now

        # Check if we have tokens
        if bucket.tokens >= 1:
            bucket.tokens -= 1
            allowed = True
        else:
            allowed = False

        # Calculate when bucket will be full again
        tokens_needed = self.capacity - bucket.tokens
        seconds_to_full = tokens_needed / self.refill_rate
        reset_at = int(now + seconds_to_full)

        return allowed, {
            "remaining": int(bucket.tokens),
            "limit": self.capacity,
            "reset_at": reset_at
        }


# Usage example
if __name__ == "__main__":
    # 10 requests/second with burst of 5
    limiter = TokenBucketRateLimiter(
        capacity=5,        # Allow burst of 5
        refill_rate=1      # Refill 1 token/second
    )

    user = "user_123"

    # Simulate burst of 7 requests
    print("Burst of 7 requests:")
    for i in range(7):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"  Request {i+1}: {status} (remaining: {info['remaining']})")

    # Wait 3 seconds for refill
    print("\nWaiting 3 seconds...")
    time.sleep(3)

    # Try again
    print("\nAfter waiting:")
    for i in range(4):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"  Request {i+1}: {status} (remaining: {info['remaining']})")

Token Bucket with Redis (Production)

import time
import redis

class RedisTokenBucket:
    """
    Distributed token bucket using Redis.

    Uses Lua script for atomic check-and-decrement.
    """

    LUA_SCRIPT = """
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    -- Get current state (or initialize)
    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    -- Calculate refill
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    -- Try to consume token
    local allowed = 0
    if tokens >= 1 then
        tokens = tokens - 1
        allowed = 1
    end

    -- Update state
    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)  -- Expire after 1 hour idle

    -- Return: allowed, remaining, reset_at
    local reset_at = now + (capacity - tokens) / refill_rate
    return {allowed, math.floor(tokens), math.floor(reset_at)}
    """

    def __init__(self, redis_client: redis.Redis, capacity: int, refill_rate: float):
        self.redis = redis_client
        self.capacity = capacity
        self.refill_rate = refill_rate
        self.script = self.redis.register_script(self.LUA_SCRIPT)

    def is_allowed(self, key: str) -> tuple[bool, dict]:
        """Check and consume token atomically."""
        now = time.time()
        result = self.script(
            keys=[f"ratelimit:{key}"],
            args=[self.capacity, self.refill_rate, now]
        )

        return result[0] == 1, {
            "remaining": result[1],
            "limit": self.capacity,
            "reset_at": result[2]
        }

Express.js Middleware

const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

// Token bucket middleware factory
function tokenBucket({ capacity, refillRate, keyGenerator }) {
  const luaScript = `
    local key = KEYS[1]
    local capacity = tonumber(ARGV[1])
    local refill_rate = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(bucket[1]) or capacity
    local last_refill = tonumber(bucket[2]) or now

    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * refill_rate)

    local allowed = 0
    if tokens >= 1 then
      tokens = tokens - 1
      allowed = 1
    end

    redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
    redis.call('EXPIRE', key, 3600)

    return {allowed, math.floor(tokens), math.floor(now + (capacity - tokens) / refill_rate)}
  `;

  return async (req, res, next) => {
    const key = keyGenerator(req);
    const now = Date.now() / 1000;

    try {
      const [allowed, remaining, resetAt] = await redis.eval(
        luaScript,
        1,
        `ratelimit:${key}`,
        capacity,
        refillRate,
        now
      );

      // Set rate limit headers
      res.set({
        'X-RateLimit-Limit': capacity,
        'X-RateLimit-Remaining': remaining,
        'X-RateLimit-Reset': resetAt,
      });

      if (allowed) {
        next();
      } else {
        res.set('Retry-After', Math.ceil(1 / refillRate));
        res.status(429).json({
          error: 'Too Many Requests',
          retryAfter: Math.ceil(1 / refillRate),
        });
      }
    } catch (err) {
      // Fail open on Redis errors
      console.error('Rate limiter error:', err);
      next();
    }
  };
}

// Usage
app.use('/api/', tokenBucket({
  capacity: 100,      // Burst of 100
  refillRate: 10,     // 10 requests/second sustained
  keyGenerator: (req) => req.user?.id || req.ip,
}));

See It In Action:

Related Concepts:

Quick Self-Check

  • Can explain token bucket in 60 seconds?
  • Understand the difference between capacity and refill rate?
  • Know why token bucket allows bursts while leaky bucket smooths output?
  • Can implement atomic token bucket in Redis with Lua?
  • Understand the trade-off with fixed window (boundary burst problem)?
  • Know when to choose sliding window over token bucket?

Production signal

Why this concept matters

Interview 60% of rate limiting discussions
Production API gateways, network QoS
Performance O(1) per request
Scale Minimal state per key