TL;DR
Replication is copying data across multiple nodes to survive failures, distribute load, and reduce latency. The core trade-off: synchronous replication guarantees consistency but adds latency; asynchronous replication is fast but risks data loss. Choose your strategy based on whether correctness or availability matters more.
Visual Overview
WHY REPLICATE? ┌─────────────────────────────────────────────────┐ │ │ │ Single Node: │ │ ┌──────┐ │ │ │ DB │ ← All eggs in one basket │ │ └──────┘ Node fails = data unavailable │ │ │ │ With Replication: │ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ DB 1 │ │ DB 2 │ │ DB 3 │ │ │ │ copy │ = │ copy │ = │ copy │ │ │ └──────┘ └──────┘ └──────┘ │ │ ↑ ↑ ↑ │ │ Primary Replica Replica │ │ │ │ Node fails? Others have the data! │ │ │ └─────────────────────────────────────────────────┘ REPLICATION STRATEGIES ┌─────────────────────────────────────────────────┐ │ │ │ Single-Leader (Primary-Replica) │ │ Writes │ │ ↓ │ │ [Leader] ───→ [Follower 1] │ │ │ └──→ [Follower 2] │ │ ↓ │ │ Reads distributed │ │ │ │ Multi-Leader │ │ [Leader 1] ←───→ [Leader 2] │ │ ↓ ↓ │ │ [Follower] [Follower] │ │ │ │ Leaderless │ │ [Node 1] ←→ [Node 2] ←→ [Node 3] │ │ All nodes can accept writes │ │ │ └─────────────────────────────────────────────────┘ THE REPLICATION TRADE-OFF ┌─────────────────────────────────────────────────┐ │ │ │ Synchronous: │ │ Client → Leader → Followers → ACK → Client │ │ └────── Wait for all ──────┘ │ │ ✓ No data loss │ │ ✗ High latency (slowest replica) │ │ │ │ Asynchronous: │ │ Client → Leader → ACK → Client │ │ └──────→ Followers (background) │ │ ✓ Low latency │ │ ✗ Data loss if leader crashes │ │ │ │ Semi-Synchronous: │ │ Wait for 1 replica (not all) │ │ Balance: Some durability, some performance │ │ │ └─────────────────────────────────────────────────┘
Why Replicate?
| Motivation | Single Node | With Replication |
|---|---|---|
| Availability | Node fails = down | Failover to replica |
| Read Scaling | One node handles all | Distribute reads |
| Latency | Single location | Replicas near users |
| Durability | Disk fails = data loss | Multiple copies survive |
Replication Strategies
1. Single-Leader (Primary-Replica)
One node (leader) accepts all writes. Followers replicate from leader.
Writes
↓
[Leader]
/ | \
↓ ↓ ↓
[F1] [F2] [F3] ← Followers
↑ ↑ ↑
Reads
Used by: PostgreSQL, MySQL, MongoDB, Redis
Pros: Simple, strong consistency possible Cons: Leader is bottleneck, failover complexity
2. Multi-Leader
Multiple nodes accept writes. Replicate changes between leaders.
[Leader DC1] ←────→ [Leader DC2]
| |
[Follower] [Follower]
Used by: CockroachDB (multi-region), collaborative editing
Pros: Local writes in each region Cons: Conflict resolution required
3. Leaderless
All nodes accept reads and writes. Quorum determines success.
Client
/ | \
↓ ↓ ↓
[N1] [N2] [N3]
Used by: Cassandra, DynamoDB, Riak
Pros: No failover needed, high availability Cons: Eventually consistent, conflict resolution
Synchronous vs Asynchronous
Synchronous Replication
Leader waits for replica acknowledgment before confirming write.
Client → Leader: WRITE x=1
Leader → Follower: Replicate x=1
Follower → Leader: ACK
Leader → Client: SUCCESS
Guarantee: If write succeeds, data is on multiple nodes Cost: Latency = leader + slowest replica
Asynchronous Replication
Leader confirms immediately, replicates in background.
Client → Leader: WRITE x=1
Leader → Client: SUCCESS
Leader → Follower: Replicate x=1 (background)
Guarantee: None—leader crash can lose committed writes Benefit: Low latency
Semi-Synchronous
Wait for at least one replica (not all).
W = 2 (write quorum)
Wait for leader + 1 follower
Remaining followers: async
Used by: MySQL semi-sync, PostgreSQL synchronous_commit
Replication Lag
The delay between write on leader and visibility on replica.
Time →
Leader: Write x=1 │ Write x=2 │ Write x=3
Replica: ──────────│ x=1 ──────│ x=2 ──────│ x=3
↑ ↑ ↑
Lag: 50ms Lag: 50ms Lag: 50ms
Lag Causes Problems
Read-your-writes violation:
User: POST update profile
User: GET profile (routed to stale replica)
User: Sees old profile! Confused.
Solutions:
- Read own writes from leader
- Track write timestamp, ensure replica is caught up
- Session affinity to same replica
Failover
When leader fails, promote a follower.
Before: [Leader*] → [F1] → [F2]
↓
After: [Leader*] [F1*] → [F2]
(dead) (new leader)
Challenges:
- Detection: Is leader dead or just slow?
- Election: Which follower becomes leader?
- Reconfiguration: Redirect clients to new leader
- Data loss: Async replication may lose recent writes
Split-Brain
Network partition creates two leaders:
[Old Leader] ──X── [New Leader + Followers]
↑ ↑
Client A Client B
Both accept writes → data divergence!
Prevention: Quorum-based leader election (Raft, Paxos)
Replication Mechanisms
| Mechanism | How It Works | Used By |
|---|---|---|
| Statement-based | Replicate SQL statements | MySQL (legacy) |
| WAL shipping | Send write-ahead log | PostgreSQL |
| Logical (row-based) | Send changed rows | MySQL binlog |
| Trigger-based | Application-level capture | Custom solutions |
Statement-Based Problems
-- Leader executes:
INSERT INTO logs VALUES (NOW(), 'event');
-- Replica executes same statement:
INSERT INTO logs VALUES (NOW(), 'event');
-- NOW() returns different time! Data diverges.
Issues: Non-deterministic functions, auto-increment IDs
WAL Shipping
Send the physical log bytes:
Block 427: [old_bytes] → [new_bytes]
Block 893: [old_bytes] → [new_bytes]
Pros: Exact byte-for-byte replication Cons: Tied to storage engine version
Logical (Row-Based)
{
"table": "users",
"op": "UPDATE",
"before": {"id": 1, "name": "old"},
"after": {"id": 1, "name": "new"}
}
Pros: Version-independent, readable, enables CDC Cons: More verbose than WAL
Production Systems
| System | Strategy | Sync Mode | Lag Typical |
|---|---|---|---|
| PostgreSQL | Single-leader | Configurable | sub-second async |
| MySQL | Single-leader | Semi-sync | sub-second |
| MongoDB | Replica set | Configurable | sub-second |
| Cassandra | Leaderless | Quorum | 10-100ms |
| Kafka | Single-leader per partition | ISR | under 100ms |
Related Content
See It In Action:
- Eventual Consistency - ~80 second animated visual explanation of async replication
Prerequisites:
- Distributed Systems Basics - Foundation concepts
Related Concepts:
- Leader-Follower Replication - Single-leader deep dive
- Consensus - Leader election algorithms
- Eventual Consistency - Async replication guarantees
- Write-Ahead Log - How WAL shipping works
Used In Systems:
- PostgreSQL, MySQL, MongoDB (database replication)
- Kafka (partition replication with ISR)
- Redis (leader-follower, Redis Cluster)
- etcd, Consul (Raft-based replication)
Next Recommended: Leader-Follower Replication - Deep dive into the most common pattern
Production signal