I/D/E · Patterns

Quorum

Summary

The minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful, ensuring consistency despite failures

TL;DR

A quorum is the minimum number of nodes required to perform an operation in a distributed system. Typically quorum = (N/2) + 1, where N is the total number of replicas, ensuring a majority agreement while tolerating minority failures. This fundamental technique enables tunable consistency in systems like Cassandra, DynamoDB, and distributed consensus protocols.

Visual Overview

Quorum Reads and Writes
QUORUM READS & WRITES (N=5 replicas, Quorum=3)

  Write Operation: SET key="value" with W=3          
                                                     
  Client sends write to 5 replicas:                  
                       
   R1    R2    R3    R4    R5              
                       
                                                
  Success Success Success Timeout  Failed            
                                                     
  W=3 acknowledgments received                      
  Write is SUCCESSFUL (quorum reached)               
                                                     
  Read Operation: GET key with R=3                   
                       
   R1    R2    R3    R4    R5              
                       
  v=42    v=42    v=42    (down)  (down)             
                                                  
                                                     
  R=3 responses received                            
  Return value=42 (majority agrees)                  
                                                     
  Guarantees:                                        
  - If R + W > N: Reads see latest write             
  - If R=3, W=3, N=5: 3+3 > 5  (strong consistency) 
  - Tolerate failures: Up to N-W for writes          
                       Up to N-R for reads           


QUORUM INTERSECTION (Why R+W>N guarantees consistency)

 N=5 replicas, W=3, R=3 
 
 Write quorum (3 nodes): 
 [R1] [R2] [R3] R4 R5 
    
 
 Read quorum (3 nodes): 
 R1 [R2] [R3] [R4] R5 
    
 
 Overlap: R2, R3 (at least one node in both) 
  
 Read MUST see the latest write! 
 (because at least one read replica has latest) 
 
 Mathematical proof: 
 R + W > N 
 3 + 3 > 5  
 Overlap size: R + W - N = 3 + 3 - 5 = 1 
 (At least 1 node must be in both quorums) 


EVENTUAL CONSISTENCY (R+W ≤ N)

 N=5 replicas, W=1, R=1 (no overlap guarantee) 
 
 Write quorum (1 node): 
 [R1] R2 R3 R4 R5 
  
 
 Read quorum (1 node): 
 R1 R2 R3 [R4] R5 
  
 
 Overlap: NONE! (read might miss the write) 
  
 Eventual consistency only 
 (replicas sync eventually via anti-entropy) 
 
 Trade-off: 
 + Faster (W=1, R=1 vs W=3, R=3) 
 + Higher availability (tolerate more failures) 
 - Might read stale data 

Core Explanation

What is a Quorum?

A quorum is the minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful. The classic quorum formula is:

Quorum Formula
Quorum = floor(N/2) + 1

Where N = total number of replicas

Examples:
N=3  Quorum = 2 (majority)
N=5  Quorum = 3 (majority)
N=7  Quorum = 4 (majority)

Why Majority?

A majority quorum ensures that any two quorums must overlap by at least one node, preventing split-brain scenarios and ensuring consistency.

Quorum Variants

1. Read Quorum (R) & Write Quorum (W)

Tunable Quorum Configuration
Tunable quorums allow trading off consistency vs availability:

Configuration Examples (N=5):

Strong Consistency:
R=3, W=3 (R+W > N)

- Reads always see latest write
- Requires 3 nodes for any operation
- Availability: Tolerate 2 failures

Eventual Consistency:
R=1, W=1 (R+W ≤ N)

- Reads may see stale data
- Fastest operations
- Availability: Tolerate 4 failures (only need 1 node)

Write-Heavy Optimization:
R=4, W=2 (R+W > N)

- Fast writes (only 2 acks needed)
- Slower reads (need 4 responses)
- Good for write-heavy workloads

Read-Heavy Optimization:
R=1, W=5 (R+W > N)

- Fast reads (only 1 response needed)
- Slower writes (all nodes must ack)
- Good for read-heavy workloads

2. Strict Quorum vs Sloppy Quorum

Strict vs Sloppy Quorum
STRICT QUORUM:

  N=3 replicas: [A, B, C]               
                                        
  Write must go to A, B, or C only      
  If 2 nodes down  write FAILS        
                                        
  Guarantees: Consistent membership     
  Trade-off: Lower availability         


SLOPPY QUORUM (Hinted Handoff):

 Preferred nodes: [A, B, C] 
 Fallback nodes: [D, E] 
 
 If A, B down: 
 Write to C, D, E (sloppy quorum)  
 Hint: "This belongs to A" 
 
 When A recovers: 
 D and E send hinted data to A 
 
 Guarantees: High availability 
 Trade-off: Temporary inconsistency 
 
 Used by: Cassandra, Riak, DynamoDB 

3. Quorum with Versioning

Quorum with Version Vectors
Handle concurrent writes with version vectors:

Scenario: Concurrent writes during network partition

 Partition A writes: value="X" version=1 
 Partition B writes: value="Y" version=1 
 
 When partition heals, quorum read finds: 
 - 2 nodes with value="X" version=1 
 - 3 nodes with value="Y" version=1 
 
 Resolution options: 
 1. Last-Write-Wins: Use timestamp 
 2. Conflict detection: Return both to app 
 3. Vector clocks: Track causality 

Why R + W > N Ensures Consistency

Mathematical Proof:

R + W > N Proof
Given:
- N = total replicas
- R = read quorum size
- W = write quorum size

If R + W > N, then:

- Write touches W nodes
- Read touches R nodes
- Overlap = R + W - N > 0

Example: N=5, R=3, W=3

- Write touches 3 nodes (any 3 of 5)
- Read touches 3 nodes (any 3 of 5)
- Overlap = 3 + 3 - 5 = 1 node minimum

Result: Read quorum MUST include at least one node
from the write quorum
 Read will see the latest write 

Visual Proof:

Quorum Overlap Examples
All possible write/read quorum combinations (N=5, W=3, R=3):

Write Quorum Read Quorum Overlap
[1,2,3] [1,2,3] 3 nodes
[1,2,3] [1,2,4] 2 nodes
[1,2,3] [1,4,5] 1 node  (minimum)
[1,2,3] [3,4,5] 1 node 
[1,2,3] [2,4,5] 1 node 

Every combination has at least 1 node overlap!
 Consistency guaranteed

Failure Tolerance

Quorum Failure Tolerance
Quorum system can tolerate failures:

For N replicas with quorum Q:

- Max failures = N - Q
- Quorum Q = floor(N/2) + 1

Examples:
N=3, Q=2  Tolerate 1 failure
N=5, Q=3  Tolerate 2 failures
N=7, Q=4  Tolerate 3 failures

For operations to succeed:

- Writes: Need W nodes alive
- Reads: Need R nodes alive

With W=3, R=3, N=5:

- Can tolerate 2 node failures (need 3 alive)
- If 3 nodes fail  system unavailable

With W=2, R=2, N=5 (eventual consistency):

- Can tolerate 3 node failures (need 2 alive)
- Higher availability, but no consistency guarantee

Real Systems Using Quorums

SystemQuorum ModelDefault ConfigurationTunable?Use Case
CassandraR/W quorumsLOCAL_QUORUMYesMulti-datacenter database
DynamoDBR/W quorumsEventually consistent (R=1)YesKey-value store
RiakR/W quorumsR=2, W=2, N=3YesDistributed database
RaftMajority quorum(N/2)+1 fixedNoConsensus protocol
PaxosMajority quorum(N/2)+1 fixedNoConsensus protocol
ZookeeperMajority quorum(N/2)+1 fixedNoCoordination service

Case Study: Cassandra Consistency Levels

Cassandra Quorum Consistency Levels
ONE:

- W=1, R=1
- Fastest, lowest consistency
- Use: Logging, metrics

QUORUM:

- W=⌈(N/2)+1⌉, R=⌈(N/2)+1⌉
- Strong consistency (R+W > N)
- Use: Critical user data

ALL:

- W=N, R=N
- Strongest consistency, lowest availability
- Use: Very critical operations

LOCAL_QUORUM:

- Quorum within local datacenter only
- Use: Multi-datacenter with low latency

Configuration Example:
CREATE KEYSPACE my_keyspace
WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3, // N=3 replicas in DC1
'datacenter2': 3 // N=3 replicas in DC2
};

// Write with quorum
INSERT INTO users (id, name) VALUES (1, 'Alice')
USING CONSISTENCY QUORUM;

// Read with quorum
SELECT * FROM users WHERE id=1
USING CONSISTENCY QUORUM;

Case Study: DynamoDB Quorum

DynamoDB Quorum Configuration
DynamoDB Configuration (N=3 replicas across AZs):

Eventually Consistent Read (default):

- R=1 (read from any replica)
- Fastest, cheapest
- May return stale data (<1s lag typically)

Strongly Consistent Read:

- R=2 (quorum read, R+W > N where W=2)
- Always returns latest data
- 2x cost of eventually consistent

Write:

- W=2 (write to quorum)
- Acknowledged when 2/3 replicas confirm
- Third replica updated asynchronously

Failure Handling:

- If 1 replica down: Quorum still works (2/3 available)
- If 2 replicas down: Writes fail, reads might fail
- Automatic recovery: Failed replica catches up via anti-entropy

API Usage:
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();

// Eventually consistent read (R=1)
await dynamodb.get({
TableName: 'Users',
Key: { userId: 123 },
ConsistentRead: false // R=1, fast, might be stale
}).promise();

// Strongly consistent read (R=2, quorum)
await dynamodb.get({
TableName: 'Users',
Key: { userId: 123 },
ConsistentRead: true // R=2, slower, always latest
}).promise();

When to Use Quorums

✓ Perfect Use Cases

Multi-Region Databases

Multi-Region Quorum Use Case
Scenario: Global e-commerce platform
Requirement: Data replicated across 5 regions, need consistency
Configuration: N=5, R=3, W=3
Benefit: Tolerate 2 region failures while maintaining consistency

High Availability with Consistency

High Availability Quorum Use Case
Scenario: User authentication system
Requirement: Must be available during failures, but need consistent reads
Configuration: N=3, R=2, W=2
Benefit: Tolerate 1 node failure, read-your-writes consistency

Tunable Consistency

Tunable Consistency Use Case
Scenario: Social media application
Requirement: Different consistency for different data
Configuration:
- User posts: R=1, W=1 (eventual consistency OK)
- User credentials: R=3, W=3 (strong consistency required)
Benefit: Optimize each data type independently

✕ When NOT to Use (or Use Carefully)

Single Datacenter with Low Latency Requirements

When Not To Use: Low Latency
Problem: Quorum reads/writes add latency (wait for multiple nodes)
Alternative: Leader-based replication with async followers
Example: Real-time gaming leaderboard

Strict Serializable Isolation Needed

When Not To Use: Serializable Isolation
Problem: Quorums provide eventual or read-your-writes consistency, not serializability
Alternative: Distributed transactions with 2PC or consensus
Example: Bank transfers requiring ACID transactions

Very Small Clusters

When Not To Use: Small Clusters
Problem: N=1 or N=2 cannot form meaningful quorums
Alternative: N=3 minimum for quorum-based systems
Reason: N=2 cannot tolerate any failures with majority quorum

Interview Application

Common Interview Question

Q: “Design a distributed key-value store that can tolerate node failures while ensuring reads return the latest written value. How would you use quorums?”

Strong Answer:

“I’d design a quorum-based system with tunable consistency:

System Architecture:

  • Replication: N=5 replicas across 5 servers (or availability zones)
  • Quorum Configuration: R=3, W=3 (strong consistency)
  • Partitioning: Consistent hashing for key distribution

Why R=3, W=3:

  • R + W = 6 > N = 5, ensuring quorum overlap
  • Any read quorum (3 nodes) MUST intersect with any write quorum (3 nodes)
  • Guarantees: Reads always see latest write
  • Fault tolerance: Tolerate 2 node failures (need 3 alive)

Write Flow:

  1. Client sends write(key, value) to coordinator
  2. Coordinator sends to all 5 replicas
  3. Wait for W=3 acknowledgments
  4. Return success to client
  5. Remaining 2 replicas update asynchronously

Read Flow:

  1. Client sends read(key) to coordinator
  2. Coordinator sends to all 5 replicas
  3. Wait for R=3 responses
  4. Compare versions (using vector clocks or timestamps)
  5. Return latest version to client
  6. Repair stale replicas in background (read repair)

Handling Conflicts:

  • Use vector clocks to detect concurrent writes
  • If conflict detected: Return all versions to client (like DynamoDB)
  • Client resolves conflict (e.g., merge shopping carts)

Optimization for Reads:

  • For read-heavy workload: R=1, W=5
  • Faster reads (wait for 1 response)
  • Slower writes (all nodes must ack)

Optimization for Writes:

  • For write-heavy workload: R=4, W=2
  • Faster writes (wait for 2 acks)
  • Slower reads (wait for 4 responses)

Trade-offs:

  • Latency: Quorum operations slower than single-node (wait for multiple responses)
  • Consistency: Strong with R+W>N, eventual with R+W≤N
  • Availability: Can tolerate N-Q failures where Q is quorum size

Real-World Example: This design is similar to Amazon DynamoDB and Apache Cassandra”

Follow-up: What if network partitions the cluster?

“With quorum-based systems during network partition:

Scenario: 5 nodes split into groups: [3 nodes] and [2 nodes]

Majority Partition (3 nodes):

  • Can form quorum (W=3, R=3) ✓
  • Accepts reads and writes
  • Remains available

Minority Partition (2 nodes):

  • Cannot form quorum ✗
  • Rejects writes (can’t get W=3)
  • Rejects reads (can’t get R=3)
  • Sacrifices availability for consistency

Alternative: Sloppy Quorum (Cassandra-style):

  • Minority partition uses hinted handoff
  • Writes to fallback nodes with hints
  • Higher availability, temporary inconsistency
  • When partition heals, hints are replayed

This demonstrates the CAP theorem trade-off:

  • Strict quorum: Consistent, Partition-tolerant (CP)
  • Sloppy quorum: Available, Partition-tolerant (AP)“

Code Example

Quorum-Based Read/Write Implementation

import time
import random
from typing import List, Dict, Any, Tuple
from dataclasses import dataclass
from collections import Counter

@dataclass
class VersionedValue:
    """Value with version for conflict detection"""
    value: Any
    version: int
    timestamp: float

class QuorumStore:
    """
    Simple quorum-based distributed key-value store
    """
    def __init__(self, num_replicas: int = 5, read_quorum: int = 3,
                 write_quorum: int = 3):
        self.num_replicas = num_replicas
        self.read_quorum = read_quorum
        self.write_quorum = write_quorum

        # Simulate replicas (in production, these are separate servers)
        self.replicas: List[Dict[str, VersionedValue]] = [
            {} for _ in range(num_replicas)
        ]

        # Simulate replica availability (for fault injection)
        self.replica_available = [True] * num_replicas

        # Validate quorum configuration
        if read_quorum + write_quorum <= num_replicas:
            print("WARNING: R+W ≤ N, eventual consistency only!")
        else:
            print(f"R+W > N ({read_quorum}+{write_quorum} > {num_replicas}), "
                  f"strong consistency guaranteed")

    def write(self, key: str, value: Any) -> bool:
        """
        Write to quorum of replicas
        Returns True if write quorum achieved
        """
        print(f"\n=== WRITE: key={key}, value={value} ===")

        # Create versioned value
        versioned_value = VersionedValue(
            value=value,
            version=int(time.time() * 1000),  # Use timestamp as version
            timestamp=time.time()
        )

        # Send write to all replicas
        acks = 0
        successful_replicas = []

        for i, replica in enumerate(self.replicas):
            if self.replica_available[i]:
                # Simulate network delay
                time.sleep(random.uniform(0.001, 0.01))

                # Write to replica
                replica[key] = versioned_value
                acks += 1
                successful_replicas.append(i)

                print(f"  Replica {i}: ACK (total: {acks}/{self.write_quorum})")

                # Check if write quorum achieved
                if acks >= self.write_quorum:
                    print(f"✓ Write quorum achieved ({acks} >= {self.write_quorum})")

                    # Continue writing to remaining replicas asynchronously
                    # (in production, this would be background task)
                    return True
            else:
                print(f"  Replica {i}: UNAVAILABLE")

        print(f"✗ Write quorum NOT achieved ({acks} < {self.write_quorum})")
        return False

    def read(self, key: str) -> Tuple[Any, bool]:
        """
        Read from quorum of replicas
        Returns (value, success) tuple
        """
        print(f"\n=== READ: key={key} ===")

        # Send read to all replicas
        responses: List[VersionedValue] = []
        responding_replicas = []

        for i, replica in enumerate(self.replicas):
            if self.replica_available[i]:
                # Simulate network delay
                time.sleep(random.uniform(0.001, 0.01))

                if key in replica:
                    responses.append(replica[key])
                    responding_replicas.append(i)
                    print(f"  Replica {i}: value={replica[key].value}, "
                          f"version={replica[key].version}")
                else:
                    print(f"  Replica {i}: key not found")

                # Check if read quorum achieved
                if len(responses) >= self.read_quorum:
                    print(f"✓ Read quorum achieved "
                          f"({len(responses)} >= {self.read_quorum})")
                    break
            else:
                print(f"  Replica {i}: UNAVAILABLE")

        if len(responses) < self.read_quorum:
            print(f"✗ Read quorum NOT achieved "
                  f"({len(responses)} < {self.read_quorum})")
            return None, False

        # Return value with highest version (latest write)
        latest = max(responses, key=lambda v: v.version)
        print(f"→ Returning latest value: {latest.value} "
              f"(version {latest.version})")

        # Read repair: Update stale replicas in background
        self._read_repair(key, latest, responding_replicas)

        return latest.value, True

    def _read_repair(self, key: str, latest: VersionedValue,
                    responding_replicas: List[int]):
        """
        Update stale replicas with latest value (background operation)
        """
        stale_replicas = []
        for i in responding_replicas:
            if self.replicas[i].get(key, None) != latest:
                stale_replicas.append(i)

        if stale_replicas:
            print(f"  Read repair: Updating stale replicas {stale_replicas}")
            for i in stale_replicas:
                self.replicas[i][key] = latest

    def simulate_failure(self, replica_id: int):
        """Simulate replica failure"""
        self.replica_available[replica_id] = False
        print(f"\n[FAILURE] Replica {replica_id} is now UNAVAILABLE")

    def simulate_recovery(self, replica_id: int):
        """Simulate replica recovery"""
        self.replica_available[replica_id] = True
        print(f"\n[RECOVERY] Replica {replica_id} is now AVAILABLE")

# Usage Examples
if __name__ == '__main__':
    # Example 1: Strong consistency (R+W > N)
    print("=" * 60)
    print("Example 1: Strong Consistency (N=5, R=3, W=3)")
    print("=" * 60)

    store = QuorumStore(num_replicas=5, read_quorum=3, write_quorum=3)

    # Write
    store.write("user:123:name", "Alice")

    # Read (should see the write)
    value, success = store.read("user:123:name")
    assert value == "Alice"

    # Example 2: Fault tolerance
    print("\n" + "=" * 60)
    print("Example 2: Fault Tolerance (2 replicas fail)")
    print("=" * 60)

    store.simulate_failure(3)
    store.simulate_failure(4)

    # Write should still succeed (need 3, have 3)
    store.write("user:456:name", "Bob")

    # Read should still succeed
    value, success = store.read("user:456:name")
    assert value == "Bob"

    # Example 3: Quorum failure (too many nodes down)
    print("\n" + "=" * 60)
    print("Example 3: Quorum Failure (3 replicas fail)")
    print("=" * 60)

    store.simulate_failure(2)  # Now 3 replicas down

    # Write should fail (need 3, have 2)
    success = store.write("user:789:name", "Charlie")
    assert not success

    # Example 4: Eventual consistency (R+W ≤ N)
    print("\n" + "=" * 60)
    print("Example 4: Eventual Consistency (N=5, R=1, W=1)")
    print("=" * 60)

    store_eventual = QuorumStore(num_replicas=5, read_quorum=1, write_quorum=1)

    # Write to 1 replica only
    store_eventual.write("counter", 42)

    # Read might return stale data (if reads from different replica)
    # In this simulation, read repair will fix it eventually
    value, success = store_eventual.read("counter")

Quorum with Conflict Detection

from typing import Dict, Set
import hashlib

@dataclass
class VectorClock:
    """Vector clock for detecting concurrent writes"""
    clocks: Dict[int, int]  # replica_id -> counter

    def increment(self, replica_id: int):
        """Increment counter for this replica"""
        self.clocks[replica_id] = self.clocks.get(replica_id, 0) + 1

    def merge(self, other: 'VectorClock'):
        """Merge with another vector clock (take max of each)"""
        for replica_id, count in other.clocks.items():
            self.clocks[replica_id] = max(
                self.clocks.get(replica_id, 0),
                count
            )

    def is_concurrent(self, other: 'VectorClock') -> bool:
        """Check if two writes are concurrent (neither causally before)"""
        self_before = False
        other_before = False

        all_replicas = set(self.clocks.keys()) | set(other.clocks.keys())

        for replica_id in all_replicas:
            self_count = self.clocks.get(replica_id, 0)
            other_count = other.clocks.get(replica_id, 0)

            if self_count < other_count:
                other_before = True
            elif self_count > other_count:
                self_before = True

        # Concurrent if both are partially before each other
        return self_before and other_before

@dataclass
class VersionedValueWithVector:
    """Value with vector clock for conflict detection"""
    value: Any
    vector_clock: VectorClock

class QuorumStoreWithConflicts(QuorumStore):
    """Quorum store that detects and handles concurrent writes"""

    def read_with_conflicts(self, key: str) -> Tuple[List[Any], bool]:
        """
        Read from quorum, detecting conflicts
        Returns (list of values, success)
        - Single value if no conflict
        - Multiple values if concurrent writes detected
        """
        print(f"\n=== READ WITH CONFLICT DETECTION: key={key} ===")

        responses: List[VersionedValueWithVector] = []

        for i, replica in enumerate(self.replicas):
            if self.replica_available[i] and key in replica:
                responses.append(replica[key])

                if len(responses) >= self.read_quorum:
                    break

        if len(responses) < self.read_quorum:
            return [], False

        # Detect conflicts using vector clocks
        conflicts = []
        resolved = []

        for i, v1 in enumerate(responses):
            is_conflict = False
            for j, v2 in enumerate(responses):
                if i != j and v1.vector_clock.is_concurrent(v2.vector_clock):
                    is_conflict = True
                    break

            if is_conflict:
                conflicts.append(v1.value)

        if conflicts:
            print(f"⚠ CONFLICT DETECTED: {len(conflicts)} concurrent writes")
            return conflicts, True
        else:
            # No conflict, return latest value
            latest = max(responses,
                        key=lambda v: sum(v.vector_clock.clocks.values()))
            return [latest.value], True

See It In Action:

Prerequisites:

Related Concepts:

Used In Systems:

  • Cassandra: Tunable quorum consistency
  • DynamoDB: Read/write quorums with strong vs eventual consistency
  • Riak: Quorum-based with sloppy quorums

Explained In Detail:

  • Distributed Systems Deep Dive - Quorum implementation details

Quick Self-Check

  • Can explain quorum in 60 seconds?
  • Understand why R+W>N ensures consistency?
  • Know how to calculate fault tolerance from quorum size?
  • Can explain difference between strict and sloppy quorums?
  • Understand trade-offs between different R/W configurations?
  • Can design a quorum system for given requirements?

Production signal

Why this concept matters

Interview 70% of distributed systems interviews
Production Cassandra, DynamoDB
Performance Tunable consistency
Scale Fault tolerance