Producer Acknowledgments

TL;DR

Producer acknowledgments (acks) control when Kafka considers a message successfully written. Options include acks=0 (no confirmation), acks=1 (leader confirms), and acks=all (all replicas confirm), trading latency for durability guarantees. Critical for balancing performance vs data safety in message brokers.

Visual Overview

Producer Acknowledgments Overview

ACKS = 0 (Fire and Forget)
┌────────────────────────────────────────────────┐
│  Producer → Message → Kafka Leader             │
│      ↓                      (don't wait)       │
│  Immediate return ✓                            │
│  Latency: <1ms                                 │
│                                                │
│  Risk: Message may be lost if:                 │
│  - Network failure before reaching leader      │
│  - Leader crashes before writing to disk       │
│  - Leader crashes before replication           │
│                                                │
│  Use case: Metrics, logs (lossy OK)            │
└────────────────────────────────────────────────┘

ACKS = 1 (Leader Acknowledgment)
┌────────────────────────────────────────────────┐
│  Producer → Message → Kafka Leader             │
│            ↓                                   │
│  Write to log ✓                                │
│            ↓                                   │
│  Send ACK → Producer                           │
│  Latency: 5-10ms                               │
│                                                │
│  Meanwhile (async):                            │
│  Leader → Replicate → Follower 1               │
│  Leader → Replicate → Follower 2               │
│                                                │
│  Risk: Message lost if leader crashes          │
│  before replication completes                  │
│                                                │
│  Use case: Most production workloads (default) │
└────────────────────────────────────────────────┘

ACKS = ALL (Full Quorum)
┌────────────────────────────────────────────────┐
│  Producer → Message → Kafka Leader             │
│            ↓                                   │
│  Write to log                                  │
│            ↓                                   │
│  Replicate to all ISR replicas                 │
│            ↓                                   │
│  Follower 1: Written ✓                         │
│  Follower 2: Written ✓                         │
│            ↓                                   │
│  Send ACK → Producer                           │
│  Latency: 10-50ms (network + replication)      │
│                                                │
│  Risk: Message never lost (replicated)         │
│  (unless all ISR replicas fail simultaneously) │
│                                                │
│  Use case: Financial transactions, orders      │
└────────────────────────────────────────────────┘

TIMELINE COMPARISON:
┌────────────────────────────────────────────────┐
│  acks=0:                                       │
│    T0: Send                                    │
│    T1: Return (1ms) ✓                          │
│                                                │
│  acks=1:                                       │
│    T0: Send                                    │
│    T5: Leader writes                           │
│    T10: Return (10ms) ✓                        │
│                                                │
│  acks=all:                                     │
│    T0: Send                                    │
│    T5: Leader writes                           │
│    T15: Follower 1 writes                      │
│    T20: Follower 2 writes                      │
│    T25: Return (25ms) ✓                        │
└────────────────────────────────────────────────┘

Core Explanation

What are Producer Acknowledgments?

Producer acknowledgments (acks) control when a Kafka producer considers a write operation successful. This determines:

When producer receives confirmation that message is safe
How many replicas must persist the message
Trade-off between latency and durability

Three Levels:

Three Acknowledgment Levels

acks=0: No acknowledgment (fire-and-forget)
acks=1: Leader acknowledgment (default)
acks=all: Full ISR acknowledgment (safest)

acks=0: No Acknowledgment

Behavior:

acks=0 Behavior

Producer sends message, immediately considers it sent
Leader receives message (maybe)
No confirmation sent back

Result:

• Highest throughput (no waiting)
• Lowest latency (<1ms)
• Zero durability guarantee

When Message Can Be Lost:

acks=0 Message Loss Scenarios

1. Network failure before reaching broker
 Producer → [Network drops packet] → Leader (never arrives)

2. Leader crash before writing to disk
 Producer → Leader (in memory) → [Crash] ✗

3. Leader crash before replication
 Producer → Leader (written) → [Crash before replicating] ✗

Probability of loss: Relatively high (1-5%)

Configuration:

const producer = kafka.producer({
  acks: 0, // No acknowledgment
  compression: "gzip", // Often used with acks=0 for max throughput
});

Use Cases:

acks=0 Use Cases

✓ Log aggregation (OK to lose some logs)
✓ Metrics collection (OK to lose some data points)
✓ IoT sensor data (high volume, redundancy)
✓ Clickstream tracking (lossy acceptable)

✗ Financial transactions
✗ User-facing data (messages, posts)
✗ Critical business events

acks=1: Leader Acknowledgment

Behavior:

acks=1 Behavior

Producer sends message
Leader writes to local log (durable on leader disk)
Leader sends ACK to producer
Producer considers message sent ✓

Meanwhile (asynchronous):
Leader replicates to followers (background)

Result:

• Good throughput
• Moderate latency (5-10ms)
• Durability: Survives producer/network failure
• Risk: Lost if leader fails before replication

When Message Can Be Lost:

acks=1 Message Loss Scenario

Scenario: Leader fails before replication

T0: Producer → Leader (message written to leader)
T1: Leader → ACK → Producer ✓
T2: Producer moves on
T3: Leader crashes ⚡ (before replicating)
T4: Follower promoted to new leader
T5: Message is GONE ✗ (was only on failed leader)

Probability: Low (1-2% during failures)
Window of vulnerability: ~500ms (replication lag)

Configuration:

const producer = kafka.producer({
  acks: 1, // Leader acknowledgment (default)
  timeout: 30000, // 30s timeout
  retry: {
    retries: 3, // Retry on failure
  },
});

Use Cases:

acks=1 Use Cases

✓ Most production workloads (default choice)
✓ High-throughput messaging
✓ Real-time analytics
✓ Event streaming

Balance between performance and safety

acks=all: Full ISR Acknowledgment

Behavior:

acks=all Behavior

Producer sends message
Leader writes to local log
Leader waits for ALL in-sync replicas (ISR) to acknowledge
All ISR replicas write to their logs
Leader sends ACK to producer
Producer considers message sent ✓

Result:

• Lower throughput
• Higher latency (10-50ms)
• Maximum durability
• Message replicated before acknowledgment

In-Sync Replicas (ISR):

In-Sync Replicas (ISR)

ISR = Set of replicas that are "caught up" with leader

Example:

• Leader: Broker 1
• Followers: Broker 2 (in sync), Broker 3 (lagging)
• ISR = {Broker 1, Broker 2}

acks=all waits for: Broker 1 + Broker 2

If Broker 2 falls behind (network issue):
ISR = {Broker 1} (just leader)
acks=all waits for: Broker 1 only (no followers!)

This is why min.insync.replicas is critical!

min.insync.replicas:

min.insync.replicas Configuration

Configuration: Minimum ISR size required for writes

min.insync.replicas=2 (recommended for acks=all)

• Requires at least 2 replicas in ISR
• If ISR shrinks to 1, producer gets error
• Prevents data loss when only leader is alive

Example with 3 replicas:
┌─────────────────────────────────────────────┐
│  Normal: ISR = {Leader, Follower1, Follower2│
│  acks=all waits for Leader + Follower1      │
│  (or Leader + Follower2, first to respond)  │
│                                             │
│  Follower1 fails: ISR = {Leader, Follower2} │
│  acks=all waits for Leader + Follower2 ✓    │
│                                             │
│  Follower2 also fails: ISR = {Leader}       │
│  acks=all REJECTS writes ✗                  │
│  (ISR size 1 < min.insync.replicas 2)       │
└─────────────────────────────────────────────┘

Protection: Cannot lose data if leader fails,
because message is on at least 2 replicas

Configuration:

const producer = kafka.producer({
  acks: -1, // -1 means "all" (acks=all)
  timeout: 30000,
  retry: {
    retries: 5,
  },
});

// Topic configuration
min.insync.replicas = 2; // At least 2 replicas must ack
replication.factor = 3; // Total of 3 replicas

Use Cases:

acks=all Use Cases

✓ Financial transactions
✓ E-commerce orders
✓ User-generated content (posts, messages)
✓ Critical business events
✓ Regulatory/compliance data

Anywhere data loss is unacceptable

Real Systems Using Producer Acks

System	Default acks	Typical Config	Rationale
Kafka Streams	acks=all	acks=all, min.insync.replicas=2	State stores require durability
Netflix (Keystone)	acks=1	acks=1, replication=3	High throughput, tolerate rare loss
LinkedIn	acks=all	acks=all, min.insync.replicas=2	Business-critical events
Uber	acks=1	acks=1 (logs), acks=all (trips)	Mixed based on data criticality
Confluent Cloud	acks=all	acks=all, min.insync.replicas=2	Default for safety

Case Study: Kafka at LinkedIn

LinkedIn Kafka Acknowledgment Strategy

LinkedIn's Kafka usage (origin of Kafka):
• 100+ billion messages/day
• 1000s of topics
• Multi-datacenter deployment

Acknowledgment Strategy:
┌───────────────────────────────────────────┐
│  Critical Data (jobs, connections):       │
│  • acks=all                               │
│  • min.insync.replicas=2                  │
│  • replication.factor=3                   │
│  → Latency: 20-30ms                       │
│  → Zero data loss                         │
│                                           │
│  Metrics/Logs (high volume):              │
│  • acks=1                                 │
│  • replication.factor=2                   │
│  → Latency: 5-10ms                        │
│  → Acceptable loss rate: <0.1%            │
│                                           │
│  Analytics Events (ultra-high volume):    │
│  • acks=0                                 │
│  • compression=gzip                       │
│  → Latency: 1-2ms                         │
│  → Loss rate: 1-2% (acceptable)           │
└───────────────────────────────────────────┘

Lesson: Different acks for different data criticality

When to Use Each Ack Level

acks=0: Fire and Forget

Use When:

acks=0 When to Use

✓ High throughput required (100k+ msg/sec)
✓ Data loss is acceptable (logs, metrics)
✓ Data has natural redundancy (sensor arrays)
✓ Ultra-low latency required (<1ms)

Example: IoT sensor network

• 1000 sensors sending data every second
• If 1% of readings lost, still have 99%
• Aggregate statistics still accurate

acks=1: Leader Only

Use When:

acks=1 When to Use

✓ Good balance of performance and safety
✓ Occasional loss acceptable during failures
✓ High throughput with moderate durability
✓ Default choice for most workloads

Example: User activity tracking

• Click events, page views, etc.
• Occasional loss during broker failure OK
• Still maintain 99%+ delivery

acks=all: Full Replication

Use When:

acks=all When to Use

✓ Zero data loss required
✓ Regulatory/compliance requirements
✓ Financial or critical business data
✓ Can tolerate higher latency (10-50ms)

Example: E-commerce order placement

• User places order (creates Kafka event)
• Order must not be lost
• OK to wait 20-30ms for full replication
• Worth latency cost for safety

Hybrid Approach

Different Topics, Different Acks:

// Critical orders: acks=all
const orderProducer = kafka.producer({
  acks: -1,
  timeout: 30000,
});

// Analytics events: acks=1
const analyticsProducer = kafka.producer({
  acks: 1,
  timeout: 10000,
});

// Metrics: acks=0
const metricsProducer = kafka.producer({
  acks: 0,
  compression: "gzip",
});

Interview Application

Common Interview Question

Q: “How would you ensure zero data loss in a Kafka-based order processing system?”

Strong Answer:

“To ensure zero data loss for orders, I’d configure producers with acks=all and proper ISR settings:

Producer Configuration:
acks=all (or acks=-1)
min.insync.replicas=2
replication.factor=3
retries=MAX_INT (infinite retries)
max.in.flight.requests=1 (for ordering)
How This Prevents Loss:

acks=all: Producer waits for full replication before considering write successful

min.insync.replicas=2: Requires at least 2 replicas (leader + 1 follower) to acknowledge

replication.factor=3: Total of 3 copies across brokers

Result: Message on ≥2 replicas before ACK

Failure Scenarios:

Network failure: Producer retries until successful

Leader failure: Message already on follower (promoted to new leader)

Follower failure: Still have leader + other follower (meets min ISR)

Leader + Follower fail: Third replica exists, can rebuild ISR

Only lose data if: All 3 replicas fail simultaneously (extremely rare)

Trade-offs:

Latency: 20-30ms vs 5-10ms for acks=1

Throughput: Lower (wait for replication)

Availability: May reject writes if ISR < 2

Worth It: For orders where data loss = lost revenue + angry customers

Monitoring: Alert if ISR falls below min.insync.replicas”

Code Example

Producer with Different Ack Levels

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "my-producer",
  brokers: ["kafka1:9092", "kafka2:9092", "kafka3:9092"],
});

// Configuration 1: acks=0 (Fire and Forget)
async function sendMetrics() {
  const producer = kafka.producer({
    acks: 0, // No acknowledgment
    compression: "gzip",
  });

  await producer.connect();

  const start = Date.now();
  await producer.send({
    topic: "metrics",
    messages: [{ value: JSON.stringify({ cpu: 80, mem: 60 }) }],
  });
  const latency = Date.now() - start;
  // ... omitted: keep concept snippets short
  await sendUserActivity(); // ~5-10ms
  await sendOrder(); // ~15-30ms

  // Trade-off: Latency vs Durability
  // acks=0:   Fastest, least safe
  // acks=1:   Balanced (default)
  // acks=all: Slowest, safest
}

benchmark();

Error Handling with acks=all

async function sendCriticalData(data) {
  const producer = kafka.producer({
    acks: -1,
    retry: {
      retries: 5,
      initialRetryTime: 300,
    },
  });

  await producer.connect();

  try {
    await producer.send({
      topic: "critical-data",
      messages: [{ value: JSON.stringify(data) }],
    });

    console.log("Data persisted successfully (acks=all)");
  } catch (error) {
    // Error types to handle:

    if (error.type === "NOT_ENOUGH_REPLICAS") {
  // ... omitted: keep concept snippets short
      // Replication took longer than timeout
      console.error("Acknowledgment timeout");
      // Action: Retry (may be duplicate)
    }

    // Store in dead letter queue for manual review
    await storeInDLQ(data, error);
    throw error;
  }
}

Prerequisites:

Leader-Follower Replication - Understanding ISR
Topic Partitioning - Kafka architecture

Related Concepts:

Quorum - ISR is a form of quorum
Idempotence - Idempotent producer with acks=all
Exactly-Once Semantics - Combines idempotence + acks=all

Used In Systems:

Kafka (producer acknowledgments)
Pulsar (similar ack levels)
RabbitMQ (publisher confirms)

Explained In Detail:

Kafka Deep Dive - Producer mechanics and acknowledgments

See It In Action

Producer Acknowledgments Explainer - ~75 second animated visual showing acks=0, acks=1, and acks=all

Quick Self-Check

Can explain acks=0/1/all in 60 seconds?
Understand latency vs durability trade-offs?
Know when messages can be lost for each ack level?
Can explain min.insync.replicas and ISR?
Understand acks=all + min.insync.replicas=2 pattern?
Know which ack level to use for different use cases?

TL;DR

Visual Overview

Core Explanation

What are Producer Acknowledgments?

acks=0: No Acknowledgment

acks=1: Leader Acknowledgment

acks=all: Full ISR Acknowledgment

Real Systems Using Producer Acks

Case Study: Kafka at LinkedIn

When to Use Each Ack Level

acks=0: Fire and Forget

acks=1: Leader Only

acks=all: Full Replication

Hybrid Approach

Interview Application

Common Interview Question

Code Example

Producer with Different Ack Levels

Error Handling with acks=all

Related Content

See It In Action

Quick Self-Check

Why this concept matters