I/D/E · Messaging

Producer Batching

Summary

How message producers batch records to achieve high throughput by amortizing network overhead and maximizing sequential I/O

TL;DR

Producer batching groups multiple messages together before sending them to the server, amortizing network overhead and maximizing throughput. Instead of sending each message immediately (1 message = 1 network request), batching collects messages for a short time window or until reaching a size threshold, then sends them together in a single request. This technique can improve throughput by 10-100x.

Visual Overview

Batching Overview
WITHOUT BATCHING (Naive Approach):
T=0ms:  [Message A]  Network Request 1
T=5ms:  [Message B]  Network Request 2
T=8ms:  [Message C]  Network Request 3
T=12ms: [Message D]  Network Request 4

Result: 4 network requests, ~50ms total latency
Overhead: 4x network round-trips, 4x TCP overhead

WITH BATCHING (Optimized):
T=0ms: [Message A] 
T=5ms: [Message B] 
T=8ms: [Message C]  Batch Accumulation
T=12ms: [Message D] 
T=20ms: [Batch: A,B,C,D]  Single Network Request

Result: 1 network request, ~30ms total latency
Overhead: 1x network round-trip, 4x compression efficiency

BATCH TRIGGERS:
 Size Threshold: batch.size = 32 KB (default)
 Time Threshold: linger.ms = 20 ms (configurable)
 Memory Pressure: Buffer full, send immediately
 Explicit Flush: Application calls flush()

Core Explanation

What is Producer Batching?

Producer batching is a performance optimization where a message producer accumulates multiple messages in memory before sending them to the server in a single network request.

Batching Architecture
Application Thread:
producer.send(message_1) 
producer.send(message_2) 
producer.send(message_3)  Batch Buffer (per partition)
producer.send(message_4)                     
producer.send(message_5)                     
                                              
Background Sender Thread:

 Wait for trigger:             
 - Size >= 32 KB               
 - Time >= linger.ms           
 - Buffer full                 

              
       [Send Batch]  Server

Key Batching Parameters:

// Batch size threshold (bytes)
batch.size = 32768  // 32 KB default

// Time to wait for batch to fill (milliseconds)
linger.ms = 0       // Send immediately (default)
linger.ms = 20      // Wait up to 20ms for more messages

// Total memory for all batches
buffer.memory = 67108864  // 64 MB default

Why Batching Dramatically Improves Performance

Network Overhead Analysis:

Network Overhead Analysis
SINGLE MESSAGE SEND:

  TCP/IP Header: 40 bytes                    
  Kafka Protocol Header: 100 bytes           
  Message Overhead: 50 bytes                 
  Actual Message Payload: 200 bytes          
   
  Total: 390 bytes                           
  Efficiency: 200/390 = 51%                  


BATCHED SEND (100 messages):

  TCP/IP Header: 40 bytes (1x)               
  Kafka Protocol Header: 100 bytes (1x)      
  Message Overhead: 50 bytes × 100 = 5000    
  Actual Message Payload: 200 × 100 = 20000  
   
  Total: 25,140 bytes                        
  Efficiency: 20000/25140 = 80%              
  Network Savings: 64x fewer requests        


Result: 64x reduction in network overhead!

Throughput Impact:

Throughput Impact
Scenario: Send 100,000 messages (200 bytes each)

NO BATCHING:
 Network RTT: 1ms per request
 Total time: 100,000 × 1ms = 100 seconds
 Throughput: 1,000 messages/sec

WITH BATCHING (100 msg/batch):
 Network RTT: 1ms per batch
 Total time: 1,000 batches × 1ms = 1 second
 Throughput: 100,000 messages/sec

100x improvement!

Batching Triggers and Tradeoffs

Batch Completion Triggers:

Batch Completion Triggers
TRIGGER 1: SIZE THRESHOLD REACHED

Current batch: 31 KB
New message: 2 KB
Total: 33 KB > batch.size (32 KB)
Action: Send batch immediately

TRIGGER 2: TIME THRESHOLD REACHED

Batch started: T=0ms
Current time: T=20ms >= linger.ms (20ms)
Action: Send batch (even if not full)

TRIGGER 3: MEMORY PRESSURE

Buffer memory: 64 MB
Used: 62 MB (97% full)
Action: Send oldest batches to free memory

TRIGGER 4: EXPLICIT FLUSH

Application calls: producer.flush()
Action: Send all pending batches immediately

The Latency-Throughput Tradeoff:

Configuration Spectrum
Low Latency (Real-time Systems):

 linger.ms = 0               Send immediately   
 batch.size = 16384 (16 KB)                      
                                                 
 Latency: ~1-2ms                                 
 Throughput: ~10K msg/sec                        
 Use case: Trading, alerts                       


Balanced (Most Applications):

 linger.ms = 10-20           Small wait window  
 batch.size = 32768 (32 KB)                      
                                                 
 Latency: ~15-25ms                               
 Throughput: ~50K msg/sec                        
 Use case: Event streaming                       


High Throughput (Analytics):

 linger.ms = 50-100          Longer wait        
 batch.size = 131072 (128 KB)                    
                                                 
 Latency: ~60-120ms                              
 Throughput: ~200K msg/sec                       
 Use case: Log aggregation                       

Production Configuration Examples

Example 1: High-Throughput Log Ingestion

Properties props = new Properties();

// Optimize for throughput
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072);    // 128 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);         // Wait 50ms
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB

// Enable compression for better batching
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Allow more in-flight requests
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);

// Result: 10x throughput improvement
// Tradeoff: ~60ms added latency

Example 2: Low-Latency Real-Time Events

Properties props = new Properties();

// Optimize for latency
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);     // 16 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 0);          // No wait
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432); // 32 MB

// Minimal compression overhead
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Limit in-flight for ordering
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);

// Result: <5ms p99 latency
// Tradeoff: Lower throughput (~20K msg/sec)

Batch Compression and Efficiency

Compression with Batching:

Compression with Batching
WHY BATCHING IMPROVES COMPRESSION:

Single Message Compression:
Message 1: {"user_id": 123, "event": "click", "timestamp": 1234567890}
Compressed: 58 bytes  52 bytes (10% savings)

Batched Messages Compression (100 messages):
Original: 5,800 bytes
Compressed (lz4): 1,200 bytes (80% savings!)

Why better compression?
 Repeated keys: "user_id", "event", "timestamp" appear 100x
 Similar values: Timestamps are sequential
 Pattern recognition: Better with larger data sets
 Compression dictionary: More effective context

Combined Batching + Compression:
 Network overhead: 64x reduction (batching)
 Payload size: 5x reduction (compression)
 Total efficiency: 320x improvement!

Production Compression Strategy:

public class CompressionStrategy {

    // LZ4: Fast compression, low CPU
    // Best for: High-throughput systems with large batches
    // Compression: 2:1 ratio, 300 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

    // Snappy: Balanced
    // Best for: Moderate throughput, balanced CPU usage
    // Compression: 2.3:1 ratio, 250 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

    // GZIP: Best compression
    // Best for: Network-limited systems, low volume
    // Compression: 3.2:1 ratio, 50 MB/sec (high CPU)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");

    // None: No compression
    // Best for: Already-compressed data (images, video)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
}

Memory Management and Buffer Pool

Buffer Pool Architecture:

Producer Memory Layout
Total Buffer: 64 MB (buffer.memory)

 Partition 0 Batch: 32 KB (ready)     Full      
 Partition 1 Batch: 28 KB (building)  Accum     
 Partition 2 Batch: 31 KB (ready)     Full      
 Partition 3 Batch: 15 KB (building)             
 ...                                             
 Free Memory: 10 MB                              


Memory Exhaustion Behavior:

1. Buffer full (free < new batch size)
2. Block send() call for max.block.ms (default 60s)
3. If still full, throw BufferExhaustedException
4. As batches send, memory freed for new batches

Monitoring:
kafka.producer:type=producer-metrics,name=buffer-available-bytes

Tradeoffs

Advantages:

  • ✓ Massively improved throughput (10-100x)
  • ✓ Reduced network overhead (90%+ fewer requests)
  • ✓ Better compression efficiency with larger batches
  • ✓ Lower CPU usage per message (amortized overhead)
  • ✓ Reduced server-side processing load

Disadvantages:

  • ✕ Increased latency (messages wait in batch)
  • ✕ Higher memory usage (buffering messages)
  • ✕ Complexity in tuning (batch.size vs linger.ms)
  • ✕ Risk of data loss if producer crashes before send
  • ✕ Larger failure blast radius (entire batch fails together)

Real Systems Using This

Apache Kafka

  • Implementation: Per-partition batching with configurable size and time thresholds
  • Scale: 7+ trillion messages/day at LinkedIn with aggressive batching
  • Default Config: 32 KB batch.size, 0ms linger.ms (conservative)
  • Production Config: 64-128 KB batch.size, 20-50ms linger.ms (optimized)

AWS Kinesis

  • Implementation: Automatic batching via PutRecords API (up to 500 records)
  • Limits: 5 MB/sec per shard, 1 MB per batch
  • SDK Behavior: KPL (Kinesis Producer Library) batches automatically

Google Cloud Pub/Sub

  • Implementation: Client library batches messages automatically
  • Config: Max batch size (1000 messages), max batch bytes (10 MB)
  • Optimization: Batching + request compression for efficiency

RabbitMQ

  • Implementation: Optional publisher confirms batching
  • Config: Manual batching via application-level buffering
  • Performance: 10x improvement with batching enabled

When to Use Producer Batching

✓ Perfect Use Cases

High-Volume Event Streaming

High-Volume Event Streaming
Scenario: Ingesting millions of events per second
Why batching: Maximizes network and disk efficiency
Example: Clickstream analytics, IoT sensor data
Config: Large batches (128 KB), medium linger (20-50ms)

Log Aggregation

Log Aggregation
Scenario: Centralized logging from 1000s of services
Why batching: Reduces load on logging infrastructure
Example: ELK stack ingestion, Splunk forwarding
Config: Large batches (128 KB), high linger (50-100ms)

Bulk Data Migration

Bulk Data Migration
Scenario: Moving large datasets between systems
Why batching: Maximum throughput, latency not critical
Example: Database CDC, ETL pipelines
Config: Maximum batches (256 KB), high linger (100ms)

✕ When NOT to Use (or Use Minimal Batching)

Real-Time Alerting

Real-Time Alerting
Problem: Critical alerts delayed by batching
Solution: linger.ms=0, small batches (16 KB)
Example: Security alerts, system monitoring

Trading Systems

Trading Systems
Problem: Milliseconds matter, batching adds latency
Solution: No batching (linger.ms=0) or very small windows
Example: High-frequency trading, order execution

Request-Response Patterns

Request-Response Patterns
Problem: User waiting for immediate response
Solution: Minimal batching, sync sends
Example: API calls, user-facing operations

Interview Application

Common Interview Question 1

Q: “How would you optimize a producer that’s sending 100,000 small messages per second, causing high CPU and network usage?”

Strong Answer:

“The issue is likely excessive network overhead from sending each message individually. I’d implement producer batching:

Diagnosis:

  • Current: 100K messages × 1 KB = 100K network requests/sec
  • Network overhead: ~50% of bandwidth wasted on headers
  • CPU overhead: 100K serialize/send operations

Solution:

// Enable aggressive batching
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 65536);    // 64 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 20);         // 20ms window
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

Result:

  • Batching: 100K messages → ~2K batches (50x reduction)
  • Compression: 64 KB → ~15 KB per batch (4x savings)
  • Network: 98% reduction in requests
  • CPU: 95% reduction in overhead
  • Added latency: ~20ms (acceptable for most use cases)

Tradeoff: 20ms added latency vs 50x throughput improvement. For log/event streaming, this is optimal.”

Why this is good:

  • Quantifies the problem
  • Provides specific configuration
  • Explains each parameter choice
  • Analyzes tradeoffs explicitly
  • Gives measurable results

Common Interview Question 2

Q: “Your Kafka producer is dropping messages under high load. How would you debug and fix this?”

Strong Answer:

“Message drops under load suggest buffer memory exhaustion. Here’s my approach:

Diagnosis Steps:

  1. Check JMX metric: buffer-available-bytes → Likely near 0
  2. Check logs for BufferExhaustedException
  3. Check max.block.ms timeout (default 60s)

Root Cause Analysis:

  • Batches accumulating faster than sender thread can send
  • Possible causes:
    • Network slowness (broker response time)
    • Too small buffer.memory for traffic volume
    • Inefficient batching (small batches = more sends)

Solutions (in order):

1. Increase buffer memory:

props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB

2. Optimize batching for throughput:

props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072);    // 128 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);         // Wait for fuller batches
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

3. Application-level backpressure:

producer.send(record, (metadata, exception) -> {
    if (exception instanceof BufferExhaustedException) {
        // Implement retry with exponential backoff
        // Or shed load (return 503 to clients)
    }
});

Result: Larger buffer + more efficient batching = 10x capacity improvement”

Why this is good:

  • Systematic debugging approach
  • Multiple solution layers
  • Specific metrics to check
  • Code examples
  • Explains root cause clearly

Red Flags to Avoid

  • ✕ Not understanding latency tradeoff of batching
  • ✕ Setting linger.ms without understanding batch.size
  • ✕ Not considering memory implications
  • ✕ Ignoring compression benefits with batching
  • ✕ Not knowing how to measure batching efficiency

Quick Self-Check

Before moving on, can you:

  • Explain producer batching in 60 seconds?
  • Draw the batching flow from send() to network?
  • List all 4 batch trigger conditions?
  • Explain the latency-throughput tradeoff?
  • Calculate network savings from batching?
  • Configure producer for high-throughput vs low-latency?

Prerequisites

None - this is a foundational performance concept

Used In Systems

  • Distributed Message Queues - Core performance technique
  • Event-Driven Architectures - Essential for high throughput

Explained In Detail


Next Recommended: Producer Acknowledgments - Understand reliability guarantees

Production signal

Why this concept matters

Interview 70% of performance interviews
Production LinkedIn (7+ trillion msgs/day)
Performance 10-100x throughput
Scale 90%+ fewer network requests