TL;DR
Producer acknowledgments (acks) control when Kafka considers a message successfully written. Options include acks=0 (no confirmation), acks=1 (leader confirms), and acks=all (all replicas confirm), trading latency for durability guarantees. Critical for balancing performance vs data safety in message brokers.
Visual Overview
ACKS = 0 (Fire and Forget) ┌────────────────────────────────────────────────┐ │ Producer → Message → Kafka Leader │ │ ↓ (don't wait) │ │ Immediate return ✓ │ │ Latency: <1ms │ │ │ │ Risk: Message may be lost if: │ │ - Network failure before reaching leader │ │ - Leader crashes before writing to disk │ │ - Leader crashes before replication │ │ │ │ Use case: Metrics, logs (lossy OK) │ └────────────────────────────────────────────────┘ ACKS = 1 (Leader Acknowledgment) ┌────────────────────────────────────────────────┐ │ Producer → Message → Kafka Leader │ │ ↓ │ │ Write to log ✓ │ │ ↓ │ │ Send ACK → Producer │ │ Latency: 5-10ms │ │ │ │ Meanwhile (async): │ │ Leader → Replicate → Follower 1 │ │ Leader → Replicate → Follower 2 │ │ │ │ Risk: Message lost if leader crashes │ │ before replication completes │ │ │ │ Use case: Most production workloads (default) │ └────────────────────────────────────────────────┘ ACKS = ALL (Full Quorum) ┌────────────────────────────────────────────────┐ │ Producer → Message → Kafka Leader │ │ ↓ │ │ Write to log │ │ ↓ │ │ Replicate to all ISR replicas │ │ ↓ │ │ Follower 1: Written ✓ │ │ Follower 2: Written ✓ │ │ ↓ │ │ Send ACK → Producer │ │ Latency: 10-50ms (network + replication) │ │ │ │ Risk: Message never lost (replicated) │ │ (unless all ISR replicas fail simultaneously) │ │ │ │ Use case: Financial transactions, orders │ └────────────────────────────────────────────────┘ TIMELINE COMPARISON: ┌────────────────────────────────────────────────┐ │ acks=0: │ │ T0: Send │ │ T1: Return (1ms) ✓ │ │ │ │ acks=1: │ │ T0: Send │ │ T5: Leader writes │ │ T10: Return (10ms) ✓ │ │ │ │ acks=all: │ │ T0: Send │ │ T5: Leader writes │ │ T15: Follower 1 writes │ │ T20: Follower 2 writes │ │ T25: Return (25ms) ✓ │ └────────────────────────────────────────────────┘
Core Explanation
What are Producer Acknowledgments?
Producer acknowledgments (acks) control when a Kafka producer considers a write operation successful. This determines:
- When producer receives confirmation that message is safe
- How many replicas must persist the message
- Trade-off between latency and durability
Three Levels:
acks=0: No acknowledgment (fire-and-forget) acks=1: Leader acknowledgment (default) acks=all: Full ISR acknowledgment (safest)
acks=0: No Acknowledgment
Behavior:
Producer sends message, immediately considers it sent Leader receives message (maybe) No confirmation sent back Result: • Highest throughput (no waiting) • Lowest latency (<1ms) • Zero durability guarantee
When Message Can Be Lost:
1. Network failure before reaching broker Producer → [Network drops packet] → Leader (never arrives) 2. Leader crash before writing to disk Producer → Leader (in memory) → [Crash] ✗ 3. Leader crash before replication Producer → Leader (written) → [Crash before replicating] ✗ Probability of loss: Relatively high (1-5%)
Configuration:
const producer = kafka.producer({
acks: 0, // No acknowledgment
compression: "gzip", // Often used with acks=0 for max throughput
});
Use Cases:
✓ Log aggregation (OK to lose some logs) ✓ Metrics collection (OK to lose some data points) ✓ IoT sensor data (high volume, redundancy) ✓ Clickstream tracking (lossy acceptable) ✗ Financial transactions ✗ User-facing data (messages, posts) ✗ Critical business events
acks=1: Leader Acknowledgment
Behavior:
Producer sends message Leader writes to local log (durable on leader disk) Leader sends ACK to producer Producer considers message sent ✓ Meanwhile (asynchronous): Leader replicates to followers (background) Result: • Good throughput • Moderate latency (5-10ms) • Durability: Survives producer/network failure • Risk: Lost if leader fails before replication
When Message Can Be Lost:
Scenario: Leader fails before replication T0: Producer → Leader (message written to leader) T1: Leader → ACK → Producer ✓ T2: Producer moves on T3: Leader crashes ⚡ (before replicating) T4: Follower promoted to new leader T5: Message is GONE ✗ (was only on failed leader) Probability: Low (1-2% during failures) Window of vulnerability: ~500ms (replication lag)
Configuration:
const producer = kafka.producer({
acks: 1, // Leader acknowledgment (default)
timeout: 30000, // 30s timeout
retry: {
retries: 3, // Retry on failure
},
});
Use Cases:
✓ Most production workloads (default choice) ✓ High-throughput messaging ✓ Real-time analytics ✓ Event streaming Balance between performance and safety
acks=all: Full ISR Acknowledgment
Behavior:
Producer sends message Leader writes to local log Leader waits for ALL in-sync replicas (ISR) to acknowledge All ISR replicas write to their logs Leader sends ACK to producer Producer considers message sent ✓ Result: • Lower throughput • Higher latency (10-50ms) • Maximum durability • Message replicated before acknowledgment
In-Sync Replicas (ISR):
ISR = Set of replicas that are "caught up" with leader Example: • Leader: Broker 1 • Followers: Broker 2 (in sync), Broker 3 (lagging) • ISR = {Broker 1, Broker 2} acks=all waits for: Broker 1 + Broker 2 If Broker 2 falls behind (network issue): ISR = {Broker 1} (just leader) acks=all waits for: Broker 1 only (no followers!) This is why min.insync.replicas is critical!
min.insync.replicas:
Configuration: Minimum ISR size required for writes min.insync.replicas=2 (recommended for acks=all) • Requires at least 2 replicas in ISR • If ISR shrinks to 1, producer gets error • Prevents data loss when only leader is alive Example with 3 replicas: ┌─────────────────────────────────────────────┐ │ Normal: ISR = {Leader, Follower1, Follower2│ │ acks=all waits for Leader + Follower1 │ │ (or Leader + Follower2, first to respond) │ │ │ │ Follower1 fails: ISR = {Leader, Follower2} │ │ acks=all waits for Leader + Follower2 ✓ │ │ │ │ Follower2 also fails: ISR = {Leader} │ │ acks=all REJECTS writes ✗ │ │ (ISR size 1 < min.insync.replicas 2) │ └─────────────────────────────────────────────┘ Protection: Cannot lose data if leader fails, because message is on at least 2 replicas
Configuration:
const producer = kafka.producer({
acks: -1, // -1 means "all" (acks=all)
timeout: 30000,
retry: {
retries: 5,
},
});
// Topic configuration
min.insync.replicas = 2; // At least 2 replicas must ack
replication.factor = 3; // Total of 3 replicas
Use Cases:
✓ Financial transactions ✓ E-commerce orders ✓ User-generated content (posts, messages) ✓ Critical business events ✓ Regulatory/compliance data Anywhere data loss is unacceptable
Real Systems Using Producer Acks
| System | Default acks | Typical Config | Rationale |
|---|---|---|---|
| Kafka Streams | acks=all | acks=all, min.insync.replicas=2 | State stores require durability |
| Netflix (Keystone) | acks=1 | acks=1, replication=3 | High throughput, tolerate rare loss |
| acks=all | acks=all, min.insync.replicas=2 | Business-critical events | |
| Uber | acks=1 | acks=1 (logs), acks=all (trips) | Mixed based on data criticality |
| Confluent Cloud | acks=all | acks=all, min.insync.replicas=2 | Default for safety |
Case Study: Kafka at LinkedIn
LinkedIn's Kafka usage (origin of Kafka): • 100+ billion messages/day • 1000s of topics • Multi-datacenter deployment Acknowledgment Strategy: ┌───────────────────────────────────────────┐ │ Critical Data (jobs, connections): │ │ • acks=all │ │ • min.insync.replicas=2 │ │ • replication.factor=3 │ │ → Latency: 20-30ms │ │ → Zero data loss │ │ │ │ Metrics/Logs (high volume): │ │ • acks=1 │ │ • replication.factor=2 │ │ → Latency: 5-10ms │ │ → Acceptable loss rate: <0.1% │ │ │ │ Analytics Events (ultra-high volume): │ │ • acks=0 │ │ • compression=gzip │ │ → Latency: 1-2ms │ │ → Loss rate: 1-2% (acceptable) │ └───────────────────────────────────────────┘ Lesson: Different acks for different data criticality
When to Use Each Ack Level
acks=0: Fire and Forget
Use When:
✓ High throughput required (100k+ msg/sec) ✓ Data loss is acceptable (logs, metrics) ✓ Data has natural redundancy (sensor arrays) ✓ Ultra-low latency required (<1ms) Example: IoT sensor network • 1000 sensors sending data every second • If 1% of readings lost, still have 99% • Aggregate statistics still accurate
acks=1: Leader Only
Use When:
✓ Good balance of performance and safety ✓ Occasional loss acceptable during failures ✓ High throughput with moderate durability ✓ Default choice for most workloads Example: User activity tracking • Click events, page views, etc. • Occasional loss during broker failure OK • Still maintain 99%+ delivery
acks=all: Full Replication
Use When:
✓ Zero data loss required ✓ Regulatory/compliance requirements ✓ Financial or critical business data ✓ Can tolerate higher latency (10-50ms) Example: E-commerce order placement • User places order (creates Kafka event) • Order must not be lost • OK to wait 20-30ms for full replication • Worth latency cost for safety
Hybrid Approach
Different Topics, Different Acks:
// Critical orders: acks=all
const orderProducer = kafka.producer({
acks: -1,
timeout: 30000,
});
// Analytics events: acks=1
const analyticsProducer = kafka.producer({
acks: 1,
timeout: 10000,
});
// Metrics: acks=0
const metricsProducer = kafka.producer({
acks: 0,
compression: "gzip",
});
Interview Application
Common Interview Question
Q: “How would you ensure zero data loss in a Kafka-based order processing system?”
Strong Answer:
“To ensure zero data loss for orders, I’d configure producers with acks=all and proper ISR settings:
Producer Configuration:
acks=all (or acks=-1) min.insync.replicas=2 replication.factor=3 retries=MAX_INT (infinite retries) max.in.flight.requests=1 (for ordering)How This Prevents Loss:
- acks=all: Producer waits for full replication before considering write successful
- min.insync.replicas=2: Requires at least 2 replicas (leader + 1 follower) to acknowledge
- replication.factor=3: Total of 3 copies across brokers
- Result: Message on ≥2 replicas before ACK
Failure Scenarios:
- Network failure: Producer retries until successful
- Leader failure: Message already on follower (promoted to new leader)
- Follower failure: Still have leader + other follower (meets min ISR)
- Leader + Follower fail: Third replica exists, can rebuild ISR
Only lose data if: All 3 replicas fail simultaneously (extremely rare)
Trade-offs:
- Latency: 20-30ms vs 5-10ms for acks=1
- Throughput: Lower (wait for replication)
- Availability: May reject writes if ISR < 2
Worth It: For orders where data loss = lost revenue + angry customers
Monitoring: Alert if ISR falls below min.insync.replicas”
Code Example
Producer with Different Ack Levels
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
clientId: "my-producer",
brokers: ["kafka1:9092", "kafka2:9092", "kafka3:9092"],
});
// Configuration 1: acks=0 (Fire and Forget)
async function sendMetrics() {
const producer = kafka.producer({
acks: 0, // No acknowledgment
compression: "gzip",
});
await producer.connect();
const start = Date.now();
await producer.send({
topic: "metrics",
messages: [{ value: JSON.stringify({ cpu: 80, mem: 60 }) }],
});
const latency = Date.now() - start;
console.log(`Metrics sent (acks=0): ${latency}ms`);
// Typical output: 1-2ms
// Risk: Message may be lost
}
// Configuration 2: acks=1 (Leader Acknowledgment)
async function sendUserActivity() {
const producer = kafka.producer({
acks: 1, // Leader acknowledgment (default)
timeout: 30000,
retry: {
retries: 3,
initialRetryTime: 100,
},
});
await producer.connect();
const start = Date.now();
await producer.send({
topic: "user-activity",
messages: [
{
key: "user-123",
value: JSON.stringify({ action: "click", page: "/products" }),
},
],
});
const latency = Date.now() - start;
console.log(`Activity sent (acks=1): ${latency}ms`);
// Typical output: 5-10ms
// Risk: Lost if leader fails before replication
}
// Configuration 3: acks=all (Full ISR Acknowledgment)
async function sendOrder() {
const producer = kafka.producer({
acks: -1, // acks=all (wait for full ISR)
timeout: 30000,
retry: {
retries: Number.MAX_VALUE, // Retry forever
initialRetryTime: 100,
maxRetryTime: 30000,
},
idempotent: true, // Exactly-once semantics
maxInFlightRequests: 1, // Preserve ordering
});
await producer.connect();
const start = Date.now();
try {
await producer.send({
topic: "orders", // Topic config: min.insync.replicas=2, replication.factor=3
messages: [
{
key: "order-456",
value: JSON.stringify({
orderId: "456",
userId: "123",
total: 99.99,
items: [{ id: "product-1", qty: 2 }],
}),
},
],
});
const latency = Date.now() - start;
console.log(`Order sent (acks=all): ${latency}ms`);
// Typical output: 15-30ms
// Guarantee: Message on ≥2 replicas, zero loss
} catch (error) {
if (error.type === "NOT_ENOUGH_REPLICAS") {
// ISR < min.insync.replicas (degraded cluster)
console.error("Cluster degraded: Not enough in-sync replicas");
// Alert operations team
// Queue order for retry
}
throw error;
}
}
// Demonstrating latency differences
async function benchmark() {
console.log("Benchmarking producer acknowledgments...\n");
await sendMetrics(); // ~1-2ms
await sendUserActivity(); // ~5-10ms
await sendOrder(); // ~15-30ms
// Trade-off: Latency vs Durability
// acks=0: Fastest, least safe
// acks=1: Balanced (default)
// acks=all: Slowest, safest
}
benchmark();
Error Handling with acks=all
async function sendCriticalData(data) {
const producer = kafka.producer({
acks: -1,
retry: {
retries: 5,
initialRetryTime: 300,
},
});
await producer.connect();
try {
await producer.send({
topic: "critical-data",
messages: [{ value: JSON.stringify(data) }],
});
console.log("Data persisted successfully (acks=all)");
} catch (error) {
// Error types to handle:
if (error.type === "NOT_ENOUGH_REPLICAS") {
// ISR < min.insync.replicas
console.error("Not enough in-sync replicas");
// Action: Alert operations, queue for retry
}
if (error.type === "NOT_ENOUGH_REPLICAS_AFTER_APPEND") {
// Message written to leader, but ISR shrank before replication
console.error("Replication failed after append");
// Action: Retry (may be duplicate, use idempotent producer)
}
if (error.type === "REQUEST_TIMED_OUT") {
// Replication took longer than timeout
console.error("Acknowledgment timeout");
// Action: Retry (may be duplicate)
}
// Store in dead letter queue for manual review
await storeInDLQ(data, error);
throw error;
}
}
Related Content
Prerequisites:
- Leader-Follower Replication - Understanding ISR
- Topic Partitioning - Kafka architecture
Related Concepts:
- Quorum - ISR is a form of quorum
- Idempotence - Idempotent producer with acks=all
- Exactly-Once Semantics - Combines idempotence + acks=all
Used In Systems:
- Kafka (producer acknowledgments)
- Pulsar (similar ack levels)
- RabbitMQ (publisher confirms)
Explained In Detail:
- Kafka Deep Dive - Producer mechanics and acknowledgments
See It In Action
- Producer Acknowledgments Explainer - ~75 second animated visual showing acks=0, acks=1, and acks=all
Quick Self-Check
- Can explain acks=0/1/all in 60 seconds?
- Understand latency vs durability trade-offs?
- Know when messages can be lost for each ack level?
- Can explain min.insync.replicas and ISR?
- Understand acks=all + min.insync.replicas=2 pattern?
- Know which ack level to use for different use cases?
Production signal