Replication | Concepts

TL;DR

Replication is copying data across multiple nodes to survive failures, distribute load, and reduce latency. The core trade-off: synchronous replication guarantees consistency but adds latency; asynchronous replication is fast but risks data loss. Choose your strategy based on whether correctness or availability matters more.

Visual Overview

Replication Overview

WHY REPLICATE?
┌─────────────────────────────────────────────────┐
│                                                 │
│  Single Node:                                   │
│  ┌──────┐                                       │
│  │  DB  │ ← All eggs in one basket              │
│  └──────┘   Node fails = data unavailable       │
│                                                 │
│  With Replication:                              │
│  ┌──────┐   ┌──────┐   ┌──────┐                 │
│  │ DB 1 │   │ DB 2 │   │ DB 3 │                 │
│  │ copy │ = │ copy │ = │ copy │                 │
│  └──────┘   └──────┘   └──────┘                 │
│     ↑          ↑          ↑                     │
│  Primary    Replica    Replica                  │
│                                                 │
│  Node fails? Others have the data!              │
│                                                 │
└─────────────────────────────────────────────────┘

REPLICATION STRATEGIES
┌─────────────────────────────────────────────────┐
│                                                 │
│  Single-Leader (Primary-Replica)                │
│       Writes                                    │
│         ↓                                       │
│     [Leader] ───→ [Follower 1]                  │
│         │    └──→ [Follower 2]                  │
│         ↓                                       │
│       Reads distributed                         │
│                                                 │
│  Multi-Leader                                   │
│     [Leader 1] ←───→ [Leader 2]                 │
│         ↓                ↓                      │
│     [Follower]      [Follower]                  │
│                                                 │
│  Leaderless                                     │
│     [Node 1] ←→ [Node 2] ←→ [Node 3]            │
│     All nodes can accept writes                 │
│                                                 │
└─────────────────────────────────────────────────┘

THE REPLICATION TRADE-OFF
┌─────────────────────────────────────────────────┐
│                                                 │
│  Synchronous:                                   │
│  Client → Leader → Followers → ACK → Client     │
│            └────── Wait for all ──────┘         │
│  ✓ No data loss                                 │
│  ✗ High latency (slowest replica)               │
│                                                 │
│  Asynchronous:                                  │
│  Client → Leader → ACK → Client                 │
│            └──────→ Followers (background)      │
│  ✓ Low latency                                  │
│  ✗ Data loss if leader crashes                  │
│                                                 │
│  Semi-Synchronous:                              │
│  Wait for 1 replica (not all)                   │
│  Balance: Some durability, some performance     │
│                                                 │
└─────────────────────────────────────────────────┘

Why Replicate?

Motivation	Single Node	With Replication
Availability	Node fails = down	Failover to replica
Read Scaling	One node handles all	Distribute reads
Latency	Single location	Replicas near users
Durability	Disk fails = data loss	Multiple copies survive

Replication Strategies

1. Single-Leader (Primary-Replica)

One node (leader) accepts all writes. Followers replicate from leader.

         Writes
            ↓
        [Leader]
        /   |   \
       ↓    ↓    ↓
    [F1]  [F2]  [F3]  ← Followers
      ↑     ↑     ↑
         Reads

Used by: PostgreSQL, MySQL, MongoDB, Redis

Pros: Simple, strong consistency possible Cons: Leader is bottleneck, failover complexity

2. Multi-Leader

Multiple nodes accept writes. Replicate changes between leaders.

    [Leader DC1] ←────→ [Leader DC2]
        |                   |
    [Follower]          [Follower]

Used by: CockroachDB (multi-region), collaborative editing

Pros: Local writes in each region Cons: Conflict resolution required

3. Leaderless

All nodes accept reads and writes. Quorum determines success.

    Client
   /   |   \
  ↓    ↓    ↓
[N1]  [N2]  [N3]

Used by: Cassandra, DynamoDB, Riak

Pros: No failover needed, high availability Cons: Eventually consistent, conflict resolution

Synchronous vs Asynchronous

Synchronous Replication

Leader waits for replica acknowledgment before confirming write.

Client → Leader: WRITE x=1
Leader → Follower: Replicate x=1
Follower → Leader: ACK
Leader → Client: SUCCESS

Guarantee: If write succeeds, data is on multiple nodes Cost: Latency = leader + slowest replica

Asynchronous Replication

Leader confirms immediately, replicates in background.

Client → Leader: WRITE x=1
Leader → Client: SUCCESS
Leader → Follower: Replicate x=1 (background)

Guarantee: None—leader crash can lose committed writes Benefit: Low latency

Semi-Synchronous

Wait for at least one replica (not all).

W = 2 (write quorum)
Wait for leader + 1 follower
Remaining followers: async

Used by: MySQL semi-sync, PostgreSQL synchronous_commit

Replication Lag

The delay between write on leader and visibility on replica.

Time →
Leader:   Write x=1 │ Write x=2 │ Write x=3
Replica:  ──────────│ x=1 ──────│ x=2 ──────│ x=3
                    ↑           ↑           ↑
               Lag: 50ms    Lag: 50ms    Lag: 50ms

Lag Causes Problems

Read-your-writes violation:

User: POST update profile
User: GET profile (routed to stale replica)
User: Sees old profile! Confused.

Solutions:

Read own writes from leader
Track write timestamp, ensure replica is caught up
Session affinity to same replica

Failover

When leader fails, promote a follower.

Before:  [Leader*] → [F1] → [F2]
                       ↓
After:   [Leader*]   [F1*] → [F2]
         (dead)    (new leader)

Challenges:

Detection: Is leader dead or just slow?
Election: Which follower becomes leader?
Reconfiguration: Redirect clients to new leader
Data loss: Async replication may lose recent writes

Split-Brain

Network partition creates two leaders:

[Old Leader] ──X── [New Leader + Followers]
     ↑                    ↑
  Client A             Client B

Both accept writes → data divergence!

Prevention: Quorum-based leader election (Raft, Paxos)

Replication Mechanisms

Mechanism	How It Works	Used By
Statement-based	Replicate SQL statements	MySQL (legacy)
WAL shipping	Send write-ahead log	PostgreSQL
Logical (row-based)	Send changed rows	MySQL binlog
Trigger-based	Application-level capture	Custom solutions

Statement-Based Problems

-- Leader executes:
INSERT INTO logs VALUES (NOW(), 'event');

-- Replica executes same statement:
INSERT INTO logs VALUES (NOW(), 'event');
-- NOW() returns different time! Data diverges.

Issues: Non-deterministic functions, auto-increment IDs

WAL Shipping

Send the physical log bytes:

Block 427: [old_bytes] → [new_bytes]
Block 893: [old_bytes] → [new_bytes]

Pros: Exact byte-for-byte replication Cons: Tied to storage engine version

Logical (Row-Based)

{
  "table": "users",
  "op": "UPDATE",
  "before": {"id": 1, "name": "old"},
  "after": {"id": 1, "name": "new"}
}

Pros: Version-independent, readable, enables CDC Cons: More verbose than WAL

Production Systems

System	Strategy	Sync Mode	Lag Typical
PostgreSQL	Single-leader	Configurable	sub-second async
MySQL	Single-leader	Semi-sync	sub-second
MongoDB	Replica set	Configurable	sub-second
Cassandra	Leaderless	Quorum	10-100ms
Kafka	Single-leader per partition	ISR	under 100ms

See It In Action:

Eventual Consistency - ~80 second animated visual explanation of async replication

Prerequisites:

Distributed Systems Basics - Foundation concepts

Related Concepts:

Leader-Follower Replication - Single-leader deep dive
Consensus - Leader election algorithms
Eventual Consistency - Async replication guarantees
Write-Ahead Log - How WAL shipping works

Used In Systems:

PostgreSQL, MySQL, MongoDB (database replication)
Kafka (partition replication with ISR)
Redis (leader-follower, Redis Cluster)
etcd, Consul (Raft-based replication)

Next Recommended: Leader-Follower Replication - Deep dive into the most common pattern