CAP Theorem | Concepts

TL;DR

The CAP theorem states that during a network partition, a distributed system must choose between consistency (all nodes see the same data) and availability (every request gets a response). You cannot have both simultaneously when the network splits. This isn’t a limitation to overcome—it’s a fundamental property of distributed systems proven mathematically.

Visual Overview

CAP Theorem Triangle

THE CAP TRIANGLE
┌────────────────────────────────────────────────────┐
│                                                    │
│                   Consistency                      │
│                       (C)                          │
│                      /   \                        │
│                     /     \                       │
│                    /  CA   \                      │
│                   /  (myth) \                     │
│                  /           \                    │
│                 /     CP      \                   │
│                /   (banking)   \                  │
│               /                 \                 │
│              /        AP         \                │
│             /    (social media)   \               │
│            /                       \              │
│     Availability ─────────── Partition Tolerance   │
│          (A)                        (P)            │
│                                                    │
│  During partition: Pick C or A (P is mandatory)    │
└────────────────────────────────────────────────────┘

WHAT EACH MEANS
┌────────────────────────────────────────────────────┐
│  C - Consistency                                   │
│      Every read receives the most recent write     │
│      All nodes see the same data at same time      │
│                                                    │
│  A - Availability                                  │
│      Every request receives a response             │
│      (not guaranteed to be the latest data)        │
│                                                    │
│  P - Partition Tolerance                           │
│      System continues to operate despite           │
│      network failures between nodes                │
└────────────────────────────────────────────────────┘

Core Explanation

What is the CAP Theorem?

Real-World Analogy: Imagine you’re running a chain of coffee shops with a shared inventory system. You sell the last bag of a special blend at Shop A.

With strong consistency: Shop B can’t sell that blend until they confirm Shop A’s sale—but if the network is down between shops, Shop B is stuck (unavailable).
With high availability: Shop B continues taking orders, potentially selling a bag they don’t have—the system stays up but might be wrong (inconsistent).

You can’t have both during a network problem. That’s CAP.

The Key Insight: Partitions Happen

The theorem only constrains behavior during network partitions. When the network is healthy, CAP doesn’t force any trade-offs—systems can provide strong consistency and high availability simultaneously. The critical insight:

Partition tolerance is not optional. Networks will fail. The real question is: when they do, do you sacrifice C or A?

Normal Operation vs Partition

NORMAL OPERATION (No Partition)
┌────────────────────────────────────────────────────┐
│                                                    │
│  Client writes x=1                                 │
│         ↓                                          │
│     ┌───────┐         ┌───────┐                    │
│     │Node A │ ══════> │Node B │                    │
│     │ x=1   │ healthy │ x=1   │                    │
│     └───────┘ network └───────┘                    │
│         ↓                                          │
│  Both nodes have x=1 ✓                             │
│  Reads from either return x=1 ✓                    │
│                                                    │
│  → CAP doesn't constrain trade-offs here           │
│    (no partition means no forced choice)           │
└────────────────────────────────────────────────────┘

NETWORK PARTITION OCCURS
┌────────────────────────────────────────────────────┐
│                                                    │
│  Client writes x=2 to Node A                       │
│         ↓                                          │
│     ┌───────┐    ╳    ┌───────┐                    │
│     │Node A │ ══════> │Node B │                    │
│     │ x=2   │ BROKEN! │ x=1   │                    │
│     └───────┘         └───────┘                    │
│                                                    │
│  Node A has x=2, Node B has x=1 (stale)            │
│  Client reads from Node B... what should happen?   │
│                                                    │
│  → NOW you must choose C or A                      │
└────────────────────────────────────────────────────┘

The Impossible Choice

CP vs AP During Partition

CHOOSE CONSISTENCY (CP)
┌────────────────────────────────────────────────────┐
│  Client reads from Node B during partition         │
│                                                    │
│  Node B: "I can't confirm I have latest data"      │
│          "I'll refuse to answer"                   │
│                                                    │
│     ┌───────┐    ╳    ┌───────┐                    │
│     │Node A │         │Node B │ ──► ERROR          │
│     │ x=2   │         │ x=1   │    "Unavailable"   │
│     └───────┘         └───────┘                    │
│                                                    │
│  ✓ Client never sees wrong data                    │
│  ✗ Client gets no data (system unavailable)        │
│                                                    │
│  Examples: ZooKeeper, etcd, Consul, HBase          │
└────────────────────────────────────────────────────┘

CHOOSE AVAILABILITY (AP)
┌────────────────────────────────────────────────────┐
│  Client reads from Node B during partition         │
│                                                    │
│  Node B: "I might not have latest data"            │
│          "But I'll give you what I have"           │
│                                                    │
│     ┌───────┐    ╳    ┌───────┐                    │
│     │Node A │         │Node B │ ──► x=1 (stale!)   │
│     │ x=2   │         │ x=1   │                    │
│     └───────┘         └───────┘                    │
│                                                    │
│  ✓ Client always gets a response                   │
│  ✗ Response might be outdated                      │
│                                                    │
│  Examples: Cassandra, DynamoDB, CouchDB            │
└────────────────────────────────────────────────────┘

Making the Decision

The right choice depends on your domain. Ask: “What’s worse for my users—seeing old data or seeing nothing?”

Domain	Worse Outcome	CAP Choice
Bank balance	Showing wrong balance → overdraft	CP
Inventory count	Overselling → angry customers	CP
Social media feed	Slight delay in seeing posts	AP
Shopping cart	Cart contents briefly out of sync	AP
Distributed lock	Two processes think they hold lock	CP
DNS lookup	Slight delay in DNS propagation	AP

Real Systems and Their CAP Behaviors

These classifications describe default or typical configurations. Most systems are tunable—the same system can behave as CP or AP depending on consistency level settings.

System	Typical Behavior	Configuration Notes	Use Case
ZooKeeper	CP-leaning	Requires majority for writes; minority partitions read-only	Distributed coordination
etcd	CP-leaning	Raft consensus requires majority	Kubernetes config store
Consul	Tunable	Default CP; can configure for AP with stale reads	Service discovery
Cassandra	Tunable	CL=ONE is AP; CL=QUORUM is CP-leaning	Multi-datacenter database
DynamoDB	Tunable	Eventually consistent reads (AP); strongly consistent reads (CP)	High-availability storage
CockroachDB	CP-leaning	Serializable by default; unavailable without quorum	Distributed SQL
MongoDB	Tunable	Depends on write/read concern settings	Document database
Riak	AP-leaning	Sibling values, application-level conflict resolution	Key-value store

Interview note: Avoid saying “X is a CP system” without qualification. Real systems offer configuration knobs. Focus on describing behavior under specific configurations.

Case Study: Partition Scenario

5-Node Cluster During Partition

5-NODE CLUSTER PARTITIONS INTO [3] AND [2]
┌────────────────────────────────────────────────────┐
│                                                    │
│   MAJORITY (3 nodes)    │    MINORITY (2 nodes)    │
│   ┌───┐ ┌───┐ ┌───┐    ╳    ┌───┐ ┌───┐            │
│   │ A │ │ B │ │ C │    ╳    │ D │ │ E │            │
│   └───┘ └───┘ └───┘    ╳    └───┘ └───┘            │
│                        ╳                           │
│                                                    │
│   CP SYSTEM (e.g., etcd):                          │
│   ├─ Majority: Accepts reads AND writes ✓          │
│   └─ Minority: Read-only or unavailable ✗          │
│                                                    │
│   AP SYSTEM (e.g., Cassandra with CL=ONE):         │
│   ├─ Majority: Accepts reads AND writes ✓          │
│   └─ Minority: Accepts reads AND writes ✓          │
│   └─ Warning: Writes may conflict!                 │
│                                                    │
│   After partition heals:                           │
│   ├─ CP: Minority syncs from majority              │
│   └─ AP: Conflict resolution needed                │
└────────────────────────────────────────────────────┘

When to Use Each

✓ Choose CP (Consistency) When

CP Use Cases

FINANCIAL TRANSACTIONS
Scenario: Bank transfer between accounts
Requirement: Cannot show wrong balance, cannot double-spend
Choice: CP - Reject transaction if partition prevents verification
Trade-off: "Transfer failed, try again" better than "You're overdrawn"

INVENTORY SYSTEMS
Scenario: E-commerce with limited stock
Requirement: Cannot sell more than available
Choice: CP - Refuse purchase if inventory state uncertain
Trade-off: Lost sale better than overselling

DISTRIBUTED LOCKS
Scenario: Only one process should run a job
Requirement: Cannot have two lock holders
Choice: CP - If unsure about lock state, assume not held
Trade-off: Job waits rather than running twice

LEADER ELECTION
Scenario: Exactly one leader needed
Requirement: Split-brain is catastrophic
Choice: CP - No leader during partition better than two leaders
Trade-off: System pauses vs corrupted state

✓ Choose AP (Availability) When

AP Use Cases

SOCIAL MEDIA FEEDS
Scenario: Displaying user's timeline
Requirement: Always show something, even if slightly outdated
Choice: AP - Show cached feed, sync when possible
Trade-off: Seeing post 30 seconds late is fine

SHOPPING CARTS
Scenario: User adding items to cart
Requirement: Never lose an add-to-cart action (at-least-once + idempotent merge = effectively-once at checkout)
Choice: AP - Accept writes, merge conflicts later
Trade-off: Cart might show duplicates briefly (resolved at checkout)

CACHING LAYERS
Scenario: CDN serving static content
Requirement: Low latency more important than freshness
Choice: AP - Serve from cache even if origin unreachable
Trade-off: Stale content for TTL duration

ANALYTICS / METRICS
Scenario: Collecting user behavior data
Requirement: Don't lose events, exact order less critical
Choice: AP - Buffer locally, sync when connected
Trade-off: Dashboards may be a few seconds behind

Interview Application

Common Interview Question

Q: “Explain the CAP theorem. How would you design a distributed database that needs to handle network partitions?”

Strong Answer:

“The CAP theorem, proven by Gilbert and Lynch in 2002, states that a distributed system cannot simultaneously provide all three guarantees during a network partition:

Consistency: Every read receives the most recent write

Availability: Every request receives a response

Partition Tolerance: System operates despite network failures

Key insight: Partition tolerance isn’t optional in distributed systems—networks will fail. So the real choice is between C and A when a partition occurs.

For my database design, I’d first ask: ‘What’s the cost of inconsistency vs unavailability?’

For a banking system: I’d choose CP. Wrong balances can cause overdrafts, legal issues. During a partition, the minority side becomes unavailable for writes. I’d implement this with consensus protocols like Raft, requiring majority quorum for commits.

For a social media feed: I’d choose AP. Users expect the app to work. Seeing a post 30 seconds late is acceptable. During partition, all nodes accept writes. After healing, I’d use CRDTs or last-write-wins to merge conflicts.

Important nuance: CAP is about the partition moment. Most of the time there’s no partition, and you can have all three. Also, different parts of a system can make different choices—payments might be CP while user profiles are AP.”

Follow-up: What is PACELC and how does it extend CAP?

“PACELC extends CAP to consider behavior when there’s no partition. It says:

PAC: During Partition, choose Availability or Consistency

ELC: Else (no partition), choose Latency or Consistency

Even without partitions, strong consistency requires coordination (more latency). So systems like DynamoDB offer eventual consistency for lower latency (EL) or strong consistency with higher latency (EC).

Example: Cassandra is PA/EL—it chooses availability during partitions and low latency normally. Spanner is PC/EC—it always chooses consistency, accepting latency for synchronous replication.”

Code Example

Demonstrating CP vs AP Behavior

"""
Simulating CAP theorem trade-offs in a distributed key-value store.
"""

from enum import Enum
from dataclasses import dataclass
from typing import Optional
import time

class CAPMode(Enum):
    CP = "consistency"  # Prioritize consistency over availability
    AP = "availability"  # Prioritize availability over consistency


@dataclass
class Node:
    name: str
    data: dict
    is_connected: bool = True  # Simulates network partition


class DistributedStore:
    # ... omitted: keep concept snippets short

    # Now partition node-a and node-b, connect node-c
    nodes[0].is_connected = False
    nodes[1].is_connected = False
    nodes[2].is_connected = True

    # Read from node-c (which missed the write)
    result = ap_store.read("balance", "node-c")
    print(f"Read from node-c (missed write): {result}")
    # AP returns stale/missing data - but it responds!

Common Misconceptions

CAP Misconceptions

MISCONCEPTION 1: "Pick any 2 of 3"
Reality: P is not optional. Networks fail. Real choice is C vs A during partition.

MISCONCEPTION 2: "CAP applies all the time"
Reality: Only during partitions. Normal operation can have all three.

MISCONCEPTION 3: "Systems are purely CP or AP"
Reality: Systems can be tunable, or use CP for some operations and AP for others.

MISCONCEPTION 4: "Eventual consistency = no consistency"
Reality: Eventual consistency is a form of consistency - data converges over time.

MISCONCEPTION 5: "CP means always consistent"
Reality: CP means sacrificing availability for consistency DURING PARTITION.

See It In Action:

CAP Theorem Explainer - Visual walkthrough of partition scenarios

Related Concepts:

Eventual Consistency - The AP consistency model
Quorum - Tuning the C-A trade-off
Consensus - Strong consistency protocols

Quick Self-Check

Can explain CAP theorem in 60 seconds?
Understand why partition tolerance is non-optional?
Can give examples of CP vs AP system choices?
Know what happens to each side of a partition in CP vs AP?
Understand that CAP only applies during partitions?
Can explain why the same system might choose CP for some operations and AP for others?