I/D/E · Messaging

Producer Acknowledgments

Summary

Mechanisms by which message producers receive confirmation that their messages were successfully persisted, enabling reliability tradeoffs between latency and durability

TL;DR

Producer acknowledgments (acks) control when Kafka considers a message successfully written. Options include acks=0 (no confirmation), acks=1 (leader confirms), and acks=all (all replicas confirm), trading latency for durability guarantees. Critical for balancing performance vs data safety in message brokers.

Visual Overview

Producer Acknowledgments Overview
ACKS = 0 (Fire and Forget)

  Producer  Message  Kafka Leader             
                            (don't wait)       
  Immediate return                             
  Latency: <1ms                                 
                                                
  Risk: Message may be lost if:                 
  - Network failure before reaching leader      
  - Leader crashes before writing to disk       
  - Leader crashes before replication           
                                                
  Use case: Metrics, logs (lossy OK)            


ACKS = 1 (Leader Acknowledgment)

  Producer  Message  Kafka Leader             
                                               
  Write to log                                 
                                               
  Send ACK  Producer                           
  Latency: 5-10ms                               
                                                
  Meanwhile (async):                            
  Leader  Replicate  Follower 1               
  Leader  Replicate  Follower 2               
                                                
  Risk: Message lost if leader crashes          
  before replication completes                  
                                                
  Use case: Most production workloads (default) 


ACKS = ALL (Full Quorum)

  Producer  Message  Kafka Leader             
                                               
  Write to log                                  
                                               
  Replicate to all ISR replicas                 
                                               
  Follower 1: Written                          
  Follower 2: Written                          
                                               
  Send ACK  Producer                           
  Latency: 10-50ms (network + replication)      
                                                
  Risk: Message never lost (replicated)         
  (unless all ISR replicas fail simultaneously) 
                                                
  Use case: Financial transactions, orders      


TIMELINE COMPARISON:

  acks=0:                                       
    T0: Send                                    
    T1: Return (1ms)                           
                                                
  acks=1:                                       
    T0: Send                                    
    T5: Leader writes                           
    T10: Return (10ms)                         
                                                
  acks=all:                                     
    T0: Send                                    
    T5: Leader writes                           
    T15: Follower 1 writes                      
    T20: Follower 2 writes                      
    T25: Return (25ms)                         

Core Explanation

What are Producer Acknowledgments?

Producer acknowledgments (acks) control when a Kafka producer considers a write operation successful. This determines:

  1. When producer receives confirmation that message is safe
  2. How many replicas must persist the message
  3. Trade-off between latency and durability

Three Levels:

Three Acknowledgment Levels
acks=0: No acknowledgment (fire-and-forget)
acks=1: Leader acknowledgment (default)
acks=all: Full ISR acknowledgment (safest)

acks=0: No Acknowledgment

Behavior:

acks=0 Behavior
Producer sends message, immediately considers it sent
Leader receives message (maybe)
No confirmation sent back

Result:

• Highest throughput (no waiting)
• Lowest latency (<1ms)
• Zero durability guarantee

When Message Can Be Lost:

acks=0 Message Loss Scenarios
1. Network failure before reaching broker
 Producer  [Network drops packet]  Leader (never arrives)

2. Leader crash before writing to disk
 Producer  Leader (in memory)  [Crash] 

3. Leader crash before replication
 Producer  Leader (written)  [Crash before replicating] 

Probability of loss: Relatively high (1-5%)

Configuration:

const producer = kafka.producer({
  acks: 0, // No acknowledgment
  compression: "gzip", // Often used with acks=0 for max throughput
});

Use Cases:

acks=0 Use Cases
 Log aggregation (OK to lose some logs)
 Metrics collection (OK to lose some data points)
 IoT sensor data (high volume, redundancy)
 Clickstream tracking (lossy acceptable)

 Financial transactions
 User-facing data (messages, posts)
 Critical business events

acks=1: Leader Acknowledgment

Behavior:

acks=1 Behavior
Producer sends message
Leader writes to local log (durable on leader disk)
Leader sends ACK to producer
Producer considers message sent 

Meanwhile (asynchronous):
Leader replicates to followers (background)

Result:

• Good throughput
• Moderate latency (5-10ms)
• Durability: Survives producer/network failure
• Risk: Lost if leader fails before replication

When Message Can Be Lost:

acks=1 Message Loss Scenario
Scenario: Leader fails before replication

T0: Producer  Leader (message written to leader)
T1: Leader  ACK  Producer 
T2: Producer moves on
T3: Leader crashes ⚡ (before replicating)
T4: Follower promoted to new leader
T5: Message is GONE  (was only on failed leader)

Probability: Low (1-2% during failures)
Window of vulnerability: ~500ms (replication lag)

Configuration:

const producer = kafka.producer({
  acks: 1, // Leader acknowledgment (default)
  timeout: 30000, // 30s timeout
  retry: {
    retries: 3, // Retry on failure
  },
});

Use Cases:

acks=1 Use Cases
 Most production workloads (default choice)
 High-throughput messaging
 Real-time analytics
 Event streaming

Balance between performance and safety

acks=all: Full ISR Acknowledgment

Behavior:

acks=all Behavior
Producer sends message
Leader writes to local log
Leader waits for ALL in-sync replicas (ISR) to acknowledge
All ISR replicas write to their logs
Leader sends ACK to producer
Producer considers message sent 

Result:

• Lower throughput
• Higher latency (10-50ms)
• Maximum durability
• Message replicated before acknowledgment

In-Sync Replicas (ISR):

In-Sync Replicas (ISR)
ISR = Set of replicas that are "caught up" with leader

Example:

• Leader: Broker 1
• Followers: Broker 2 (in sync), Broker 3 (lagging)
• ISR = {Broker 1, Broker 2}

acks=all waits for: Broker 1 + Broker 2

If Broker 2 falls behind (network issue):
ISR = {Broker 1} (just leader)
acks=all waits for: Broker 1 only (no followers!)

This is why min.insync.replicas is critical!

min.insync.replicas:

min.insync.replicas Configuration
Configuration: Minimum ISR size required for writes

min.insync.replicas=2 (recommended for acks=all)

• Requires at least 2 replicas in ISR
• If ISR shrinks to 1, producer gets error
• Prevents data loss when only leader is alive

Example with 3 replicas:

  Normal: ISR = {Leader, Follower1, Follower2
  acks=all waits for Leader + Follower1      
  (or Leader + Follower2, first to respond)  
                                             
  Follower1 fails: ISR = {Leader, Follower2} 
  acks=all waits for Leader + Follower2     
                                             
  Follower2 also fails: ISR = {Leader}       
  acks=all REJECTS writes                   
  (ISR size 1 < min.insync.replicas 2)       


Protection: Cannot lose data if leader fails,
because message is on at least 2 replicas

Configuration:

const producer = kafka.producer({
  acks: -1, // -1 means "all" (acks=all)
  timeout: 30000,
  retry: {
    retries: 5,
  },
});

// Topic configuration
min.insync.replicas = 2; // At least 2 replicas must ack
replication.factor = 3; // Total of 3 replicas

Use Cases:

acks=all Use Cases
 Financial transactions
 E-commerce orders
 User-generated content (posts, messages)
 Critical business events
 Regulatory/compliance data

Anywhere data loss is unacceptable

Real Systems Using Producer Acks

SystemDefault acksTypical ConfigRationale
Kafka Streamsacks=allacks=all, min.insync.replicas=2State stores require durability
Netflix (Keystone)acks=1acks=1, replication=3High throughput, tolerate rare loss
LinkedInacks=allacks=all, min.insync.replicas=2Business-critical events
Uberacks=1acks=1 (logs), acks=all (trips)Mixed based on data criticality
Confluent Cloudacks=allacks=all, min.insync.replicas=2Default for safety

Case Study: Kafka at LinkedIn

LinkedIn Kafka Acknowledgment Strategy
LinkedIn's Kafka usage (origin of Kafka):
• 100+ billion messages/day
• 1000s of topics
• Multi-datacenter deployment

Acknowledgment Strategy:

  Critical Data (jobs, connections):       
  • acks=all                               
  • min.insync.replicas=2                  
  • replication.factor=3                   
   Latency: 20-30ms                       
   Zero data loss                         
                                           
  Metrics/Logs (high volume):              
  • acks=1                                 
  • replication.factor=2                   
   Latency: 5-10ms                        
   Acceptable loss rate: <0.1%            
                                           
  Analytics Events (ultra-high volume):    
  • acks=0                                 
  • compression=gzip                       
   Latency: 1-2ms                         
   Loss rate: 1-2% (acceptable)           


Lesson: Different acks for different data criticality

When to Use Each Ack Level

acks=0: Fire and Forget

Use When:

acks=0 When to Use
 High throughput required (100k+ msg/sec)
 Data loss is acceptable (logs, metrics)
 Data has natural redundancy (sensor arrays)
 Ultra-low latency required (<1ms)

Example: IoT sensor network

• 1000 sensors sending data every second
• If 1% of readings lost, still have 99%
• Aggregate statistics still accurate

acks=1: Leader Only

Use When:

acks=1 When to Use
 Good balance of performance and safety
 Occasional loss acceptable during failures
 High throughput with moderate durability
 Default choice for most workloads

Example: User activity tracking

• Click events, page views, etc.
• Occasional loss during broker failure OK
• Still maintain 99%+ delivery

acks=all: Full Replication

Use When:

acks=all When to Use
 Zero data loss required
 Regulatory/compliance requirements
 Financial or critical business data
 Can tolerate higher latency (10-50ms)

Example: E-commerce order placement

• User places order (creates Kafka event)
• Order must not be lost
• OK to wait 20-30ms for full replication
• Worth latency cost for safety

Hybrid Approach

Different Topics, Different Acks:

// Critical orders: acks=all
const orderProducer = kafka.producer({
  acks: -1,
  timeout: 30000,
});

// Analytics events: acks=1
const analyticsProducer = kafka.producer({
  acks: 1,
  timeout: 10000,
});

// Metrics: acks=0
const metricsProducer = kafka.producer({
  acks: 0,
  compression: "gzip",
});

Interview Application

Common Interview Question

Q: “How would you ensure zero data loss in a Kafka-based order processing system?”

Strong Answer:

“To ensure zero data loss for orders, I’d configure producers with acks=all and proper ISR settings:

Producer Configuration:

acks=all (or acks=-1)
min.insync.replicas=2
replication.factor=3
retries=MAX_INT (infinite retries)
max.in.flight.requests=1 (for ordering)

How This Prevents Loss:

  1. acks=all: Producer waits for full replication before considering write successful
  2. min.insync.replicas=2: Requires at least 2 replicas (leader + 1 follower) to acknowledge
  3. replication.factor=3: Total of 3 copies across brokers
  4. Result: Message on ≥2 replicas before ACK

Failure Scenarios:

  • Network failure: Producer retries until successful
  • Leader failure: Message already on follower (promoted to new leader)
  • Follower failure: Still have leader + other follower (meets min ISR)
  • Leader + Follower fail: Third replica exists, can rebuild ISR

Only lose data if: All 3 replicas fail simultaneously (extremely rare)

Trade-offs:

  • Latency: 20-30ms vs 5-10ms for acks=1
  • Throughput: Lower (wait for replication)
  • Availability: May reject writes if ISR < 2

Worth It: For orders where data loss = lost revenue + angry customers

Monitoring: Alert if ISR falls below min.insync.replicas”

Code Example

Producer with Different Ack Levels

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "my-producer",
  brokers: ["kafka1:9092", "kafka2:9092", "kafka3:9092"],
});

// Configuration 1: acks=0 (Fire and Forget)
async function sendMetrics() {
  const producer = kafka.producer({
    acks: 0, // No acknowledgment
    compression: "gzip",
  });

  await producer.connect();

  const start = Date.now();
  await producer.send({
    topic: "metrics",
    messages: [{ value: JSON.stringify({ cpu: 80, mem: 60 }) }],
  });
  const latency = Date.now() - start;

  console.log(`Metrics sent (acks=0): ${latency}ms`);
  // Typical output: 1-2ms
  // Risk: Message may be lost
}

// Configuration 2: acks=1 (Leader Acknowledgment)
async function sendUserActivity() {
  const producer = kafka.producer({
    acks: 1, // Leader acknowledgment (default)
    timeout: 30000,
    retry: {
      retries: 3,
      initialRetryTime: 100,
    },
  });

  await producer.connect();

  const start = Date.now();
  await producer.send({
    topic: "user-activity",
    messages: [
      {
        key: "user-123",
        value: JSON.stringify({ action: "click", page: "/products" }),
      },
    ],
  });
  const latency = Date.now() - start;

  console.log(`Activity sent (acks=1): ${latency}ms`);
  // Typical output: 5-10ms
  // Risk: Lost if leader fails before replication
}

// Configuration 3: acks=all (Full ISR Acknowledgment)
async function sendOrder() {
  const producer = kafka.producer({
    acks: -1, // acks=all (wait for full ISR)
    timeout: 30000,
    retry: {
      retries: Number.MAX_VALUE, // Retry forever
      initialRetryTime: 100,
      maxRetryTime: 30000,
    },
    idempotent: true, // Exactly-once semantics
    maxInFlightRequests: 1, // Preserve ordering
  });

  await producer.connect();

  const start = Date.now();
  try {
    await producer.send({
      topic: "orders", // Topic config: min.insync.replicas=2, replication.factor=3
      messages: [
        {
          key: "order-456",
          value: JSON.stringify({
            orderId: "456",
            userId: "123",
            total: 99.99,
            items: [{ id: "product-1", qty: 2 }],
          }),
        },
      ],
    });
    const latency = Date.now() - start;

    console.log(`Order sent (acks=all): ${latency}ms`);
    // Typical output: 15-30ms
    // Guarantee: Message on ≥2 replicas, zero loss
  } catch (error) {
    if (error.type === "NOT_ENOUGH_REPLICAS") {
      // ISR < min.insync.replicas (degraded cluster)
      console.error("Cluster degraded: Not enough in-sync replicas");
      // Alert operations team
      // Queue order for retry
    }
    throw error;
  }
}

// Demonstrating latency differences
async function benchmark() {
  console.log("Benchmarking producer acknowledgments...\n");

  await sendMetrics(); // ~1-2ms
  await sendUserActivity(); // ~5-10ms
  await sendOrder(); // ~15-30ms

  // Trade-off: Latency vs Durability
  // acks=0:   Fastest, least safe
  // acks=1:   Balanced (default)
  // acks=all: Slowest, safest
}

benchmark();

Error Handling with acks=all

async function sendCriticalData(data) {
  const producer = kafka.producer({
    acks: -1,
    retry: {
      retries: 5,
      initialRetryTime: 300,
    },
  });

  await producer.connect();

  try {
    await producer.send({
      topic: "critical-data",
      messages: [{ value: JSON.stringify(data) }],
    });

    console.log("Data persisted successfully (acks=all)");
  } catch (error) {
    // Error types to handle:

    if (error.type === "NOT_ENOUGH_REPLICAS") {
      // ISR < min.insync.replicas
      console.error("Not enough in-sync replicas");
      // Action: Alert operations, queue for retry
    }

    if (error.type === "NOT_ENOUGH_REPLICAS_AFTER_APPEND") {
      // Message written to leader, but ISR shrank before replication
      console.error("Replication failed after append");
      // Action: Retry (may be duplicate, use idempotent producer)
    }

    if (error.type === "REQUEST_TIMED_OUT") {
      // Replication took longer than timeout
      console.error("Acknowledgment timeout");
      // Action: Retry (may be duplicate)
    }

    // Store in dead letter queue for manual review
    await storeInDLQ(data, error);
    throw error;
  }
}

Prerequisites:

Related Concepts:

Used In Systems:

  • Kafka (producer acknowledgments)
  • Pulsar (similar ack levels)
  • RabbitMQ (publisher confirms)

Explained In Detail:

  • Kafka Deep Dive - Producer mechanics and acknowledgments

See It In Action

Quick Self-Check

  • Can explain acks=0/1/all in 60 seconds?
  • Understand latency vs durability trade-offs?
  • Know when messages can be lost for each ack level?
  • Can explain min.insync.replicas and ISR?
  • Understand acks=all + min.insync.replicas=2 pattern?
  • Know which ack level to use for different use cases?

Production signal

Why this concept matters

Interview 65% of messaging interviews
Production Durability control
Performance Latency vs safety tradeoffs
Scale Data loss prevention