Producer Batching

TL;DR

Producer batching groups multiple messages together before sending them to the server, amortizing network overhead and maximizing throughput. Instead of sending each message immediately (1 message = 1 network request), batching collects messages for a short time window or until reaching a size threshold, then sends them together in a single request. This technique can improve throughput by 10-100x.

Visual Overview

Batching Overview

WITHOUT BATCHING (Naive Approach):
T=0ms:  [Message A] ──▶ Network Request 1
T=5ms:  [Message B] ──▶ Network Request 2
T=8ms:  [Message C] ──▶ Network Request 3
T=12ms: [Message D] ──▶ Network Request 4

Result: 4 network requests, ~50ms total latency
Overhead: 4x network round-trips, 4x TCP overhead

WITH BATCHING (Optimized):
T=0ms: [Message A] ──┐
T=5ms: [Message B] ──┤
T=8ms: [Message C] ──┼─── Batch Accumulation
T=12ms: [Message D] ──┘
T=20ms: [Batch: A,B,C,D] ──▶ Single Network Request

Result: 1 network request, ~30ms total latency
Overhead: 1x network round-trip, 4x compression efficiency

BATCH TRIGGERS:
├── Size Threshold: batch.size = 32 KB (default)
├── Time Threshold: linger.ms = 20 ms (configurable)
├── Memory Pressure: Buffer full, send immediately
└── Explicit Flush: Application calls flush()

Core Explanation

What is Producer Batching?

Producer batching is a performance optimization where a message producer accumulates multiple messages in memory before sending them to the server in a single network request.

Batching Architecture

Application Thread:
producer.send(message_1) ──┐
producer.send(message_2) ──┤
producer.send(message_3) ──┼──▶ Batch Buffer (per partition)
producer.send(message_4) ──┤                    │
producer.send(message_5) ──┘                    │
                                              ▼
Background Sender Thread:
┌───────────────────────────────┐
│ Wait for trigger:             │
│ - Size >= 32 KB               │
│ - Time >= linger.ms           │
│ - Buffer full                 │
└───────────────────────────────┘
              ▼
       [Send Batch] ──▶ Server

Key Batching Parameters:

// Batch size threshold (bytes)
batch.size = 32768  // 32 KB default

// Time to wait for batch to fill (milliseconds)
linger.ms = 0       // Send immediately (default)
linger.ms = 20      // Wait up to 20ms for more messages

// Total memory for all batches
buffer.memory = 67108864  // 64 MB default

Why Batching Dramatically Improves Performance

Network Overhead Analysis:

Network Overhead Analysis

SINGLE MESSAGE SEND:
┌─────────────────────────────────────────────┐
│  TCP/IP Header: 40 bytes                    │
│  Kafka Protocol Header: 100 bytes           │
│  Message Overhead: 50 bytes                 │
│  Actual Message Payload: 200 bytes          │
│  ────────────────────────────────────────── │
│  Total: 390 bytes                           │
│  Efficiency: 200/390 = 51%                  │
└─────────────────────────────────────────────┘

BATCHED SEND (100 messages):
┌─────────────────────────────────────────────┐
│  TCP/IP Header: 40 bytes (1x)               │
│  Kafka Protocol Header: 100 bytes (1x)      │
│  Message Overhead: 50 bytes × 100 = 5000    │
│  Actual Message Payload: 200 × 100 = 20000  │
│  ────────────────────────────────────────── │
│  Total: 25,140 bytes                        │
│  Efficiency: 20000/25140 = 80%              │
│  Network Savings: 64x fewer requests        │
└─────────────────────────────────────────────┘

Result: 64x reduction in network overhead!

Throughput Impact:

Throughput Impact

Scenario: Send 100,000 messages (200 bytes each)

NO BATCHING:
├── Network RTT: 1ms per request
├── Total time: 100,000 × 1ms = 100 seconds
└── Throughput: 1,000 messages/sec

WITH BATCHING (100 msg/batch):
├── Network RTT: 1ms per batch
├── Total time: 1,000 batches × 1ms = 1 second
└── Throughput: 100,000 messages/sec

100x improvement!

Batching Triggers and Tradeoffs

Batch Completion Triggers:

Batch Completion Triggers

TRIGGER 1: SIZE THRESHOLD REACHED
─────────────────────────────────────
Current batch: 31 KB
New message: 2 KB
Total: 33 KB > batch.size (32 KB)
Action: Send batch immediately

TRIGGER 2: TIME THRESHOLD REACHED
─────────────────────────────────────
Batch started: T=0ms
Current time: T=20ms >= linger.ms (20ms)
Action: Send batch (even if not full)

TRIGGER 3: MEMORY PRESSURE
─────────────────────────────────────
Buffer memory: 64 MB
Used: 62 MB (97% full)
Action: Send oldest batches to free memory

TRIGGER 4: EXPLICIT FLUSH
─────────────────────────────────────
Application calls: producer.flush()
Action: Send all pending batches immediately

The Latency-Throughput Tradeoff:

Configuration Spectrum

Low Latency (Real-time Systems):
┌─────────────────────────────────────────────────┐
│ linger.ms = 0              ← Send immediately   │
│ batch.size = 16384 (16 KB)                      │
│                                                 │
│ Latency: ~1-2ms                                 │
│ Throughput: ~10K msg/sec                        │
│ Use case: Trading, alerts                       │
└─────────────────────────────────────────────────┘

Balanced (Most Applications):
┌─────────────────────────────────────────────────┐
│ linger.ms = 10-20          ← Small wait window  │
│ batch.size = 32768 (32 KB)                      │
│                                                 │
│ Latency: ~15-25ms                               │
│ Throughput: ~50K msg/sec                        │
│ Use case: Event streaming                       │
└─────────────────────────────────────────────────┘

High Throughput (Analytics):
┌─────────────────────────────────────────────────┐
│ linger.ms = 50-100         ← Longer wait        │
│ batch.size = 131072 (128 KB)                    │
│                                                 │
│ Latency: ~60-120ms                              │
│ Throughput: ~200K msg/sec                       │
│ Use case: Log aggregation                       │
└─────────────────────────────────────────────────┘

Production Configuration Examples

Example 1: High-Throughput Log Ingestion

Properties props = new Properties();

// Optimize for throughput
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072);    // 128 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50);         // Wait 50ms
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB

// Enable compression for better batching
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Allow more in-flight requests
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);

// Result: 10x throughput improvement
// Tradeoff: ~60ms added latency

Example 2: Low-Latency Real-Time Events

Properties props = new Properties();

// Optimize for latency
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384);     // 16 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 0);          // No wait
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432); // 32 MB

// Minimal compression overhead
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// Limit in-flight for ordering
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);

// Result: <5ms p99 latency
// Tradeoff: Lower throughput (~20K msg/sec)

Batch Compression and Efficiency

Compression with Batching:

Compression with Batching

WHY BATCHING IMPROVES COMPRESSION:

Single Message Compression:
Message 1: {"user_id": 123, "event": "click", "timestamp": 1234567890}
Compressed: 58 bytes → 52 bytes (10% savings)

Batched Messages Compression (100 messages):
Original: 5,800 bytes
Compressed (lz4): 1,200 bytes (80% savings!)

Why better compression?
├── Repeated keys: "user_id", "event", "timestamp" appear 100x
├── Similar values: Timestamps are sequential
├── Pattern recognition: Better with larger data sets
└── Compression dictionary: More effective context

Combined Batching + Compression:
├── Network overhead: 64x reduction (batching)
├── Payload size: 5x reduction (compression)
└── Total efficiency: 320x improvement!

Production Compression Strategy:

public class CompressionStrategy {

    // LZ4: Fast compression, low CPU
    // Best for: High-throughput systems with large batches
    // Compression: 2:1 ratio, 300 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

    // Snappy: Balanced
    // Best for: Moderate throughput, balanced CPU usage
    // Compression: 2.3:1 ratio, 250 MB/sec
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");

    // GZIP: Best compression
    // Best for: Network-limited systems, low volume
    // Compression: 3.2:1 ratio, 50 MB/sec (high CPU)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");

    // None: No compression
    // Best for: Already-compressed data (images, video)
    props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
}

Memory Management and Buffer Pool

Buffer Pool Architecture:

Producer Memory Layout

Total Buffer: 64 MB (buffer.memory)
┌─────────────────────────────────────────────────┐
│ Partition 0 Batch: 32 KB (ready)    ← Full      │
│ Partition 1 Batch: 28 KB (building) ← Accum     │
│ Partition 2 Batch: 31 KB (ready)    ← Full      │
│ Partition 3 Batch: 15 KB (building)             │
│ ...                                             │
│ Free Memory: 10 MB                              │
└─────────────────────────────────────────────────┘

Memory Exhaustion Behavior:

1. Buffer full (free < new batch size)
2. Block send() call for max.block.ms (default 60s)
3. If still full, throw BufferExhaustedException
4. As batches send, memory freed for new batches

Monitoring:
kafka.producer:type=producer-metrics,name=buffer-available-bytes

Tradeoffs

Advantages:

✓ Massively improved throughput (10-100x)
✓ Reduced network overhead (90%+ fewer requests)
✓ Better compression efficiency with larger batches
✓ Lower CPU usage per message (amortized overhead)
✓ Reduced server-side processing load

Disadvantages:

✕ Increased latency (messages wait in batch)
✕ Higher memory usage (buffering messages)
✕ Complexity in tuning (batch.size vs linger.ms)
✕ Risk of data loss if producer crashes before send
✕ Larger failure blast radius (entire batch fails together)

Real Systems Using This

Apache Kafka

Implementation: Per-partition batching with configurable size and time thresholds
Scale: 7+ trillion messages/day at LinkedIn with aggressive batching
Default Config: 32 KB batch.size, 0ms linger.ms (conservative)
Production Config: 64-128 KB batch.size, 20-50ms linger.ms (optimized)

AWS Kinesis

Implementation: Automatic batching via PutRecords API (up to 500 records)
Limits: 5 MB/sec per shard, 1 MB per batch
SDK Behavior: KPL (Kinesis Producer Library) batches automatically

Google Cloud Pub/Sub

Implementation: Client library batches messages automatically
Config: Max batch size (1000 messages), max batch bytes (10 MB)
Optimization: Batching + request compression for efficiency

RabbitMQ

Implementation: Optional publisher confirms batching
Config: Manual batching via application-level buffering
Performance: 10x improvement with batching enabled

When to Use Producer Batching

✓ Perfect Use Cases

Use Case	Why Batching	Example	Config
High-Volume Event Streaming (millions of events/sec)	Maximizes network and disk efficiency	Clickstream analytics, IoT sensor data	Large batches (128 KB), medium linger (20-50ms)
Log Aggregation (1000s of services)	Reduces load on logging infrastructure	ELK stack ingestion, Splunk forwarding	Large batches (128 KB), high linger (50-100ms)
Bulk Data Migration (large datasets)	Maximum throughput, latency not critical	Database CDC, ETL pipelines	Maximum batches (256 KB), high linger (100ms)

✕ When NOT to Use (or Use Minimal Batching)

Scenario	Problem	Solution	Example
Real-Time Alerting	Critical alerts delayed by batching	linger.ms=0, small batches (16 KB)	Security alerts, system monitoring
Trading Systems	Milliseconds matter, batching adds latency	No batching (linger.ms=0) or very small windows	High-frequency trading, order execution
Request-Response Patterns	User waiting for immediate response	Minimal batching, sync sends	API calls, user-facing operations

Interview Application

Common Interview Question 1

Q: “How would you optimize a producer that’s sending 100,000 small messages per second, causing high CPU and network usage?”

Strong Answer:

“The issue is likely excessive network overhead from sending each message individually. I’d implement producer batching:

Diagnosis:

Current: 100K messages × 1 KB = 100K network requests/sec

Network overhead: ~50% of bandwidth wasted on headers

CPU overhead: 100K serialize/send operations

Solution:
// Enable aggressive batching
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 65536);    // 64 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 20);         // 20ms window
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
Result:

Batching: 100K messages → ~2K batches (50x reduction)

Compression: 64 KB → ~15 KB per batch (4x savings)

Network: 98% reduction in requests

CPU: 95% reduction in overhead

Added latency: ~20ms (acceptable for most use cases)

Tradeoff: 20ms added latency vs 50x throughput improvement. For log/event streaming, this is optimal.”

Why this is good:

Quantifies the problem
Provides specific configuration
Explains each parameter choice
Analyzes tradeoffs explicitly
Gives measurable results

Common Interview Question 2

Q: “Your Kafka producer is dropping messages under high load. How would you debug and fix this?”