I/D/E · Patterns

Sliding Window

Summary

A rate limiting algorithm that smoothly enforces request limits by weighting the previous time window, preventing boundary burst problems

TL;DR

Sliding window counter is a rate limiting algorithm that fixes the boundary-burst problem of fixed windows. It weights the previous window’s count based on how far into the current window you are, providing smoother rate enforcement with minimal memory overhead (just two counters per key). Unlike token bucket, it doesn’t allow bursts—it enforces strict rate limits.

Visual Overview

Sliding Window Counter
THE BOUNDARY BURST PROBLEM

  Fixed Window Rate Limit: 100 requests per minute  
                                                    
  Window 1 (2:00-2:59):    Window 2 (3:00-3:59):   
  [.......●●●●●●●●●●]     [●●●●●●●●●●.......]      
       100 at 2:55              100 at 3:00        
                                                  
                        
               200 requests in 10 seconds!          
                                                    
  Both are "100/minute"—but the system sees 2x!    


SLIDING WINDOW SOLUTION

  At 3:15 (15 seconds = 25% into current window):   
                                                    
  Previous window (2:00-2:59): 80 requests          
  Current window (3:00-3:59): 30 requests so far    
  Elapsed into current window: 25%                  
                                                    
         
   Weighted count = prev × (1 - elapsed) + curr   
                 = 80 × 0.75 + 30                 
                 = 60 + 30                        
                 = 90                             
         
                                                    
  90 < 100 limit  ALLOWED                          
  At 110: 80 × 0.75 + 31 = 91  ALLOWED            
  At 120: 80 × 0.75 + 41 = 101  REJECTED          
                                                    

Core Explanation

What is Sliding Window?

Real-World Analogy: Imagine a highway toll booth that tracks cars per hour. A fixed window system would reset at exactly each hour—so a convoy could race through at 11:59 and again at 12:00. A sliding window instead asks: “In the last 60 minutes from right now, how many cars passed?” It smoothly weights recent and older traffic.

Like looking backward through a sliding 1-hour window that moves with you through time, instead of fixed calendar hours.

How It Works

The algorithm maintains two counters per rate-limited key:

  1. prev_count: Requests in the previous complete window
  2. curr_count: Requests in the current ongoing window
Sliding Window State
SLIDING WINDOW STATE MACHINE

                                                    
  On Request Arrival:                               
     
   1. Determine current window index              
      window = floor(now / window_size)           
                                                  
   2. If new window, rotate counters              
      prev_count = curr_count                     
      curr_count = 0                              
                                                  
   3. Calculate elapsed fraction                  
      elapsed = (now % window_size) / window_size 
     
                                                   
     
   4. Calculate weighted count                    
      weighted = prev × (1 - elapsed) + curr      
                                                  
   5. Check: weighted < limit?                    
      YES  curr_count++, return ALLOW            
      NO   return REJECT (429)                   
     
                                                    
  State per key: ~24 bytes (2 counters + window_id) 
  Time complexity: O(1) per request                 
                                                    

The Algorithm

on request:
  window = floor(now / window_size)
  elapsed = (now % window_size) / window_size

  if window != current_window:
    # Rotate to new window
    prev_count = curr_count
    curr_count = 0
    current_window = window

  weighted_count = prev_count * (1 - elapsed) + curr_count

  if weighted_count < limit:
    curr_count += 1
    return ALLOW
  else:
    return REJECT

Key Properties

PropertyValueNotes
Time ComplexityO(1) per requestSimple arithmetic
Space ComplexityO(2) per keyTwo counters + window ID
Burst PreventionYesNo sudden spikes allowed
AccuracyApproximateWeighted estimate, not exact

Why Weighted Counting Works

The weighting formula estimates how many requests occurred in the “last window_size seconds” as if we were looking backward from now:

Weighted Count Visualization
VISUALIZING THE SLIDING WINDOW

                                                    
  Real time:     
                                                    
  Previous Window        Current Window             
  [----------------]     [----------------]         
        2:00-2:59              3:00-3:59            
                                                    
  At 3:15 (25% into current window):                
                                                    
                      60 seconds                  
                    [==============]                
                                                  
                  2:15           3:15               
                                                  
              75% of prev     25% of curr           
                                                    
  We estimate: 75% of prev's requests likely        
  occurred in the part that overlaps our window.    
  Plus 100% of current window so far.               
                                                    

Sliding Window vs Fixed Window vs Token Bucket

Algorithm Comparison
COMPARISON TABLE

 Aspect        Fixed Window  Sliding Win   Token Bucket 

 Memory/key    1 counter     2 counters    2 values     
 Boundary      2× burst      Prevented     N/A          
 Bursts        At boundary   No bursts     Allowed      
 Accuracy      Exact/window  Approximate   Exact/bucket 
 Complexity    Simplest      Simple        Simple       
 Best for      Never use     Strict limit  Bursty OK    


WHEN EACH MAKES SENSE

                                                    
  Fixed Window:  Avoid (boundary burst problem)    
                                                    
  Sliding Window:  Billing, quotas, SLAs           
   "Exactly 1000 requests per hour"               
   No acceptable overage                          
                                                    
  Token Bucket:  APIs with bursty clients          
   "100 burst, 10/sec sustained"                  
   Legitimate burst traffic expected              
                                                    

Real Systems Using Sliding Window

SystemImplementationConfigurationUse Case
Stripe APISliding window stylePer-endpoint limitsPayment processing
Cloudflare Rate LimitingConfigurable algorithmsRule-basedEdge protection
Redis Rate Limiterredis-cell moduleConfigurableGeneral rate limiting
EnvoyToken bucket + sliding windowFilter chainService mesh
HAProxystick-tablesConnection rate limitingLoad balancer

Note: Many systems offer configurable algorithms. Verify current behavior in documentation.

Configuration Patterns

Common Sliding Window Configurations
BILLING/QUOTA ENFORCEMENT

  Use Case: API usage billing                       
  Window: 1 hour                                    
  Limit: Based on subscription tier                 
  Why sliding: Prevents gaming at window boundaries 


SLA ENFORCEMENT

  Use Case: Partner API contracts                   
  Window: 1 minute                                  
  Limit: Contracted rate                            
  Why sliding: Consistent experience, no spikes    


ABUSE PREVENTION

  Use Case: Login attempts, password resets         
  Window: 15 minutes                                
  Limit: 5 attempts                                 
  Why sliding: Prevents rapid-fire attacks          

When to Use Sliding Window

✓ Perfect Use Cases

Sliding Window Use Cases
BILLING/METERING
Scenario: API calls billed per 1000 requests
Requirement: Cannot exceed purchased quota
Configuration: window=1h, limit=1000/tier
Trade-off: Slightly more state than fixed window

SLA ENFORCEMENT
Scenario: Partner contract guarantees X req/sec
Requirement: Smooth, predictable rate limiting
Configuration: window=1min, limit=600
Trade-off: No burst tolerance for legitimate spikes

SECURITY RATE LIMITS
Scenario: Login, password reset, verification codes
Requirement: Prevent rapid-fire attacks
Configuration: window=15min, limit=5
Trade-off: Legitimate users with typos may hit limit

FAIR USAGE POLICIES
Scenario: Shared resource (storage, compute)
Requirement: No single user dominates
Configuration: window=1h, limit=proportional_share
Trade-off: Bursty but legitimate workloads throttled

✕ When NOT to Use

When Sliding Window May Not Fit
BURSTY BUT LEGITIMATE TRAFFIC
Problem: Mobile apps burst on launch, then go idle
Example: 50 API calls in 1 second on app open
Alternative: Token bucket (allows burst up to capacity)
When OK: If burst is never acceptable

REAL-TIME BIDDING/TRADING
Problem: Milliseconds matter, weighted calc adds latency
Example: High-frequency trading, ad bidding
Alternative: Local in-memory counters, approximate
When OK: If microsecond latency isn't critical

LOW-VOLUME ENDPOINTS
Problem: Overhead not worth it for low traffic
Example: Admin endpoints with 10 requests/day
Alternative: Simple fixed window or none
When OK: High-value endpoints worth protecting

EXACT COUNTING REQUIRED
Problem: Sliding window is an approximation
Example: Regulatory requirement for exact counts
Alternative: Sliding window log (stores each timestamp)
When OK: Approximation within 5% is acceptable

Interview Application

Common Interview Question

Q: “Explain the boundary burst problem in rate limiting and how sliding window solves it.”

Strong Answer:

“The boundary burst problem occurs with fixed window rate limiting. If your window is 1 minute and resets at :00, a client can make 100 requests at 0:59 and 100 more at 1:00—effectively 200 requests in 2 seconds while technically respecting ‘100/minute’.

Sliding window counter solves this by weighting the previous window:

The Formula:

weighted = prev_window × (1 - elapsed_fraction) + curr_window

If we’re 25% into the current window, we count 75% of the previous window’s requests plus 100% of current. This approximates ‘requests in the last 60 seconds from right now.’

Example: At 1:15 (25% into minute 1):

  • Previous window (minute 0): 80 requests
  • Current window (minute 1): 25 requests so far
  • Weighted: 80 × 0.75 + 25 = 60 + 25 = 85
  • If limit is 100: ALLOWED

Trade-offs:

  • Pro: Prevents boundary gaming, smooth enforcement
  • Con: Approximate (not exact count), no burst tolerance

When I’d use it:

  • Billing/quotas where overage isn’t acceptable
  • SLA enforcement with partner contracts

When I’d use token bucket instead:

  • APIs where legitimate burst traffic is expected (mobile app startup)”

Follow-up: How would you implement sliding window for distributed rate limiting?

“For distributed sliding window, I’d use Redis with atomic operations:

  1. Key structure: ratelimit:{user}:{window} stores count per window
  2. Atomic increment: Use INCR with EXPIRE
  3. Two-key lookup: Fetch current and previous window counts

Lua script approach (atomic):

-- Get both windows atomically
local prev = redis.call('GET', prev_key) or 0
local curr = redis.call('GET', curr_key) or 0
local weighted = prev * (1 - elapsed) + curr
if weighted < limit then
  redis.call('INCR', curr_key)
  redis.call('EXPIRE', curr_key, window_size * 2)
  return 1
end
return 0

Considerations:

  • Redis failures: Fail open (allow) or fail closed (reject) based on use case
  • Multi-region: Local Redis per region with eventual sync, accept slight inaccuracy
  • Clock skew: Use Redis server time, not client time”

Follow-up: What’s the sliding window log variant?

Sliding window log stores the timestamp of each request instead of just a count. To check the limit, you count timestamps within the last window_size seconds.

Pros: Exact counting, no approximation Cons: O(N) space per key, O(log N) per request with cleanup

When to use: Regulatory requirements for exact counts, very low-volume endpoints where precision matters.

For high-throughput APIs, the counter variant’s approximation (typically within 1-5% of actual) is acceptable and much more efficient.”

Code Example

Sliding Window Counter (Python)

import time
from typing import Tuple, Dict
from dataclasses import dataclass

@dataclass
class WindowState:
    """State for sliding window rate limiter."""
    prev_count: int = 0
    curr_count: int = 0
    current_window: int = 0

class SlidingWindowRateLimiter:
    """
    Sliding window counter rate limiter.

    Prevents boundary burst problem by weighting previous window.
    """

    def __init__(self, limit: int, window_seconds: int):
        """
        Args:
            limit: Maximum requests per window
            window_seconds: Window size in seconds
        """
        self.limit = limit
        self.window_seconds = window_seconds
        self.states: Dict[str, WindowState] = {}

    def is_allowed(self, key: str) -> Tuple[bool, dict]:
        """
        Check if request is allowed under sliding window limit.

        Args:
            key: Rate limit key (user_id, ip, api_key)

        Returns:
            (allowed: bool, info: dict)
        """
        now = time.time()
        window = int(now // self.window_seconds)
        elapsed = (now % self.window_seconds) / self.window_seconds

        # Get or create state
        if key not in self.states:
            self.states[key] = WindowState()

        state = self.states[key]

        # Rotate windows if needed
        if window != state.current_window:
            if window == state.current_window + 1:
                # Normal rotation to next window
                state.prev_count = state.curr_count
            else:
                # Skipped windows (no activity), prev is 0
                state.prev_count = 0
            state.curr_count = 0
            state.current_window = window

        # Calculate weighted count
        weighted = state.prev_count * (1 - elapsed) + state.curr_count

        if weighted < self.limit:
            state.curr_count += 1
            allowed = True
            remaining = int(self.limit - weighted - 1)
        else:
            allowed = False
            remaining = 0

        return allowed, {
            "remaining": max(0, remaining),
            "limit": self.limit,
            "window_seconds": self.window_seconds,
            "weighted_count": round(weighted, 2)
        }


# Usage example
if __name__ == "__main__":
    # 5 requests per 10 seconds
    limiter = SlidingWindowRateLimiter(
        limit=5,
        window_seconds=10
    )

    user = "user_123"

    print("Testing sliding window rate limiter (5 req / 10 sec):\n")

    # Simulate requests
    for i in range(8):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"Request {i+1}: {status}")
        print(f"  Weighted count: {info['weighted_count']}")
        print(f"  Remaining: {info['remaining']}\n")

    # Wait for partial window rotation
    print("Waiting 5 seconds (half a window)...\n")
    time.sleep(5)

    # More requests - previous window still partially counted
    for i in range(3):
        allowed, info = limiter.is_allowed(user)
        status = "✓ ALLOWED" if allowed else "✗ REJECTED"
        print(f"Request {i+9}: {status}")
        print(f"  Weighted count: {info['weighted_count']}")
        print(f"  Remaining: {info['remaining']}\n")

Sliding Window with Redis (Production)

import time
import redis

class RedisSlidingWindow:
    """
    Distributed sliding window rate limiter using Redis.

    Uses two keys per client: previous and current window counts.
    """

    LUA_SCRIPT = """
    local curr_key = KEYS[1]
    local prev_key = KEYS[2]
    local limit = tonumber(ARGV[1])
    local window_size = tonumber(ARGV[2])
    local elapsed = tonumber(ARGV[3])

    -- Get counts (default 0)
    local prev_count = tonumber(redis.call('GET', prev_key) or 0)
    local curr_count = tonumber(redis.call('GET', curr_key) or 0)

    -- Calculate weighted count
    local weighted = prev_count * (1 - elapsed) + curr_count

    if weighted < limit then
        -- Increment current window
        redis.call('INCR', curr_key)
        -- Set expiry (2 windows to ensure prev is available)
        redis.call('EXPIRE', curr_key, window_size * 2)
        return {1, math.floor(limit - weighted - 1), weighted}
    else
        return {0, 0, weighted}
    end
    """

    def __init__(self, redis_client: redis.Redis, limit: int, window_seconds: int):
        self.redis = redis_client
        self.limit = limit
        self.window_seconds = window_seconds
        self.script = self.redis.register_script(self.LUA_SCRIPT)

    def is_allowed(self, key: str) -> tuple[bool, dict]:
        """Check and increment atomically in Redis."""
        now = time.time()
        window = int(now // self.window_seconds)
        elapsed = (now % self.window_seconds) / self.window_seconds

        curr_key = f"ratelimit:{key}:{window}"
        prev_key = f"ratelimit:{key}:{window - 1}"

        result = self.script(
            keys=[curr_key, prev_key],
            args=[self.limit, self.window_seconds, elapsed]
        )

        return result[0] == 1, {
            "remaining": result[1],
            "limit": self.limit,
            "weighted": round(result[2], 2)
        }

Express.js Middleware

const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

function slidingWindowLimiter({ limit, windowSeconds, keyGenerator }) {
  const luaScript = `
    local curr_key = KEYS[1]
    local prev_key = KEYS[2]
    local limit = tonumber(ARGV[1])
    local window_size = tonumber(ARGV[2])
    local elapsed = tonumber(ARGV[3])

    local prev_count = tonumber(redis.call('GET', prev_key) or 0)
    local curr_count = tonumber(redis.call('GET', curr_key) or 0)

    local weighted = prev_count * (1 - elapsed) + curr_count

    if weighted < limit then
      redis.call('INCR', curr_key)
      redis.call('EXPIRE', curr_key, window_size * 2)
      return {1, math.floor(limit - weighted - 1)}
    else
      return {0, 0}
    end
  `;

  return async (req, res, next) => {
    const key = keyGenerator(req);
    const now = Date.now() / 1000;
    const window = Math.floor(now / windowSeconds);
    const elapsed = (now % windowSeconds) / windowSeconds;

    const currKey = `ratelimit:${key}:${window}`;
    const prevKey = `ratelimit:${key}:${window - 1}`;

    try {
      const [allowed, remaining] = await redis.eval(
        luaScript,
        2,
        currKey,
        prevKey,
        limit,
        windowSeconds,
        elapsed
      );

      res.set({
        'X-RateLimit-Limit': limit,
        'X-RateLimit-Remaining': remaining,
        'X-RateLimit-Window': windowSeconds,
      });

      if (allowed) {
        next();
      } else {
        res.status(429).json({
          error: 'Rate limit exceeded',
          retryAfter: Math.ceil(windowSeconds * (1 - elapsed)),
        });
      }
    } catch (err) {
      console.error('Rate limiter error:', err);
      next(); // Fail open
    }
  };
}

// Usage: 100 requests per minute, smooth enforcement
app.use('/api/', slidingWindowLimiter({
  limit: 100,
  windowSeconds: 60,
  keyGenerator: (req) => req.user?.id || req.ip,
}));

See It In Action:

Related Concepts:

Quick Self-Check

  • Can explain the boundary burst problem in 30 seconds?
  • Understand the weighted count formula and why it works?
  • Know the trade-off between sliding window and token bucket?
  • Can implement sliding window counter with Redis?
  • Understand when approximation is acceptable vs when you need exact counts?
  • Know why sliding window is preferred for billing/quotas?

Production signal

Why this concept matters

Interview 55% of rate limiting discussions
Production API rate limiters, CDNs
Performance O(1) per request
Scale 2 counters per key