TL;DR
Producer batching groups multiple messages together before sending them to the server, amortizing network overhead and maximizing throughput. Instead of sending each message immediately (1 message = 1 network request), batching collects messages for a short time window or until reaching a size threshold, then sends them together in a single request. This technique can improve throughput by 10-100x.
Visual Overview
WITHOUT BATCHING (Naive Approach): T=0ms: [Message A] ──▶ Network Request 1 T=5ms: [Message B] ──▶ Network Request 2 T=8ms: [Message C] ──▶ Network Request 3 T=12ms: [Message D] ──▶ Network Request 4 Result: 4 network requests, ~50ms total latency Overhead: 4x network round-trips, 4x TCP overhead WITH BATCHING (Optimized): T=0ms: [Message A] ──┐ T=5ms: [Message B] ──┤ T=8ms: [Message C] ──┼─── Batch Accumulation T=12ms: [Message D] ──┘ T=20ms: [Batch: A,B,C,D] ──▶ Single Network Request Result: 1 network request, ~30ms total latency Overhead: 1x network round-trip, 4x compression efficiency BATCH TRIGGERS: ├── Size Threshold: batch.size = 32 KB (default) ├── Time Threshold: linger.ms = 20 ms (configurable) ├── Memory Pressure: Buffer full, send immediately └── Explicit Flush: Application calls flush()
Core Explanation
What is Producer Batching?
Producer batching is a performance optimization where a message producer accumulates multiple messages in memory before sending them to the server in a single network request.
Application Thread: producer.send(message_1) ──┐ producer.send(message_2) ──┤ producer.send(message_3) ──┼──▶ Batch Buffer (per partition) producer.send(message_4) ──┤ │ producer.send(message_5) ──┘ │ ▼ Background Sender Thread: ┌───────────────────────────────┐ │ Wait for trigger: │ │ - Size >= 32 KB │ │ - Time >= linger.ms │ │ - Buffer full │ └───────────────┬───────────────┘ ▼ [Send Batch] ──▶ Server
Key Batching Parameters:
// Batch size threshold (bytes)
batch.size = 32768 // 32 KB default
// Time to wait for batch to fill (milliseconds)
linger.ms = 0 // Send immediately (default)
linger.ms = 20 // Wait up to 20ms for more messages
// Total memory for all batches
buffer.memory = 67108864 // 64 MB default
Why Batching Dramatically Improves Performance
Network Overhead Analysis:
SINGLE MESSAGE SEND: ┌─────────────────────────────────────────────┐ │ TCP/IP Header: 40 bytes │ │ Kafka Protocol Header: 100 bytes │ │ Message Overhead: 50 bytes │ │ Actual Message Payload: 200 bytes │ │ ────────────────────────────────────────── │ │ Total: 390 bytes │ │ Efficiency: 200/390 = 51% │ └─────────────────────────────────────────────┘ BATCHED SEND (100 messages): ┌─────────────────────────────────────────────┐ │ TCP/IP Header: 40 bytes (1x) │ │ Kafka Protocol Header: 100 bytes (1x) │ │ Message Overhead: 50 bytes × 100 = 5000 │ │ Actual Message Payload: 200 × 100 = 20000 │ │ ────────────────────────────────────────── │ │ Total: 25,140 bytes │ │ Efficiency: 20000/25140 = 80% │ │ Network Savings: 64x fewer requests │ └─────────────────────────────────────────────┘ Result: 64x reduction in network overhead!
Throughput Impact:
Scenario: Send 100,000 messages (200 bytes each) NO BATCHING: ├── Network RTT: 1ms per request ├── Total time: 100,000 × 1ms = 100 seconds └── Throughput: 1,000 messages/sec WITH BATCHING (100 msg/batch): ├── Network RTT: 1ms per batch ├── Total time: 1,000 batches × 1ms = 1 second └── Throughput: 100,000 messages/sec 100x improvement!
Batching Triggers and Tradeoffs
Batch Completion Triggers:
TRIGGER 1: SIZE THRESHOLD REACHED ───────────────────────────────────── Current batch: 31 KB New message: 2 KB Total: 33 KB > batch.size (32 KB) Action: Send batch immediately TRIGGER 2: TIME THRESHOLD REACHED ───────────────────────────────────── Batch started: T=0ms Current time: T=20ms >= linger.ms (20ms) Action: Send batch (even if not full) TRIGGER 3: MEMORY PRESSURE ───────────────────────────────────── Buffer memory: 64 MB Used: 62 MB (97% full) Action: Send oldest batches to free memory TRIGGER 4: EXPLICIT FLUSH ───────────────────────────────────── Application calls: producer.flush() Action: Send all pending batches immediately
The Latency-Throughput Tradeoff:
Low Latency (Real-time Systems): ┌─────────────────────────────────────────────────┐ │ linger.ms = 0 ← Send immediately │ │ batch.size = 16384 (16 KB) │ │ │ │ Latency: ~1-2ms │ │ Throughput: ~10K msg/sec │ │ Use case: Trading, alerts │ └─────────────────────────────────────────────────┘ Balanced (Most Applications): ┌─────────────────────────────────────────────────┐ │ linger.ms = 10-20 ← Small wait window │ │ batch.size = 32768 (32 KB) │ │ │ │ Latency: ~15-25ms │ │ Throughput: ~50K msg/sec │ │ Use case: Event streaming │ └─────────────────────────────────────────────────┘ High Throughput (Analytics): ┌─────────────────────────────────────────────────┐ │ linger.ms = 50-100 ← Longer wait │ │ batch.size = 131072 (128 KB) │ │ │ │ Latency: ~60-120ms │ │ Throughput: ~200K msg/sec │ │ Use case: Log aggregation │ └─────────────────────────────────────────────────┘
Production Configuration Examples
Example 1: High-Throughput Log Ingestion
Properties props = new Properties();
// Optimize for throughput
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072); // 128 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 50); // Wait 50ms
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB
// Enable compression for better batching
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
// Allow more in-flight requests
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5);
// Result: 10x throughput improvement
// Tradeoff: ~60ms added latency
Example 2: Low-Latency Real-Time Events
Properties props = new Properties();
// Optimize for latency
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 16384); // 16 KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 0); // No wait
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 33554432); // 32 MB
// Minimal compression overhead
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
// Limit in-flight for ordering
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 1);
// Result: <5ms p99 latency
// Tradeoff: Lower throughput (~20K msg/sec)
Batch Compression and Efficiency
Compression with Batching:
WHY BATCHING IMPROVES COMPRESSION: Single Message Compression: Message 1: {"user_id": 123, "event": "click", "timestamp": 1234567890} Compressed: 58 bytes → 52 bytes (10% savings) Batched Messages Compression (100 messages): Original: 5,800 bytes Compressed (lz4): 1,200 bytes (80% savings!) Why better compression? ├── Repeated keys: "user_id", "event", "timestamp" appear 100x ├── Similar values: Timestamps are sequential ├── Pattern recognition: Better with larger data sets └── Compression dictionary: More effective context Combined Batching + Compression: ├── Network overhead: 64x reduction (batching) ├── Payload size: 5x reduction (compression) └── Total efficiency: 320x improvement!
Production Compression Strategy:
public class CompressionStrategy {
// LZ4: Fast compression, low CPU
// Best for: High-throughput systems with large batches
// Compression: 2:1 ratio, 300 MB/sec
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");
// Snappy: Balanced
// Best for: Moderate throughput, balanced CPU usage
// Compression: 2.3:1 ratio, 250 MB/sec
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
// GZIP: Best compression
// Best for: Network-limited systems, low volume
// Compression: 3.2:1 ratio, 50 MB/sec (high CPU)
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "gzip");
// None: No compression
// Best for: Already-compressed data (images, video)
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "none");
}
Memory Management and Buffer Pool
Buffer Pool Architecture:
Total Buffer: 64 MB (buffer.memory) ┌─────────────────────────────────────────────────┐ │ Partition 0 Batch: 32 KB (ready) ← Full │ │ Partition 1 Batch: 28 KB (building) ← Accum │ │ Partition 2 Batch: 31 KB (ready) ← Full │ │ Partition 3 Batch: 15 KB (building) │ │ ... │ │ Free Memory: 10 MB │ └─────────────────────────────────────────────────┘ Memory Exhaustion Behavior: 1. Buffer full (free < new batch size) 2. Block send() call for max.block.ms (default 60s) 3. If still full, throw BufferExhaustedException 4. As batches send, memory freed for new batches Monitoring: kafka.producer:type=producer-metrics,name=buffer-available-bytes
Tradeoffs
Advantages:
- ✓ Massively improved throughput (10-100x)
- ✓ Reduced network overhead (90%+ fewer requests)
- ✓ Better compression efficiency with larger batches
- ✓ Lower CPU usage per message (amortized overhead)
- ✓ Reduced server-side processing load
Disadvantages:
- ✕ Increased latency (messages wait in batch)
- ✕ Higher memory usage (buffering messages)
- ✕ Complexity in tuning (batch.size vs linger.ms)
- ✕ Risk of data loss if producer crashes before send
- ✕ Larger failure blast radius (entire batch fails together)
Real Systems Using This
Apache Kafka
- Implementation: Per-partition batching with configurable size and time thresholds
- Scale: 7+ trillion messages/day at LinkedIn with aggressive batching
- Default Config: 32 KB batch.size, 0ms linger.ms (conservative)
- Production Config: 64-128 KB batch.size, 20-50ms linger.ms (optimized)
AWS Kinesis
- Implementation: Automatic batching via PutRecords API (up to 500 records)
- Limits: 5 MB/sec per shard, 1 MB per batch
- SDK Behavior: KPL (Kinesis Producer Library) batches automatically
Google Cloud Pub/Sub
- Implementation: Client library batches messages automatically
- Config: Max batch size (1000 messages), max batch bytes (10 MB)
- Optimization: Batching + request compression for efficiency
RabbitMQ
- Implementation: Optional publisher confirms batching
- Config: Manual batching via application-level buffering
- Performance: 10x improvement with batching enabled
When to Use Producer Batching
✓ Perfect Use Cases
High-Volume Event Streaming
Scenario: Ingesting millions of events per second Why batching: Maximizes network and disk efficiency Example: Clickstream analytics, IoT sensor data Config: Large batches (128 KB), medium linger (20-50ms)
Log Aggregation
Scenario: Centralized logging from 1000s of services Why batching: Reduces load on logging infrastructure Example: ELK stack ingestion, Splunk forwarding Config: Large batches (128 KB), high linger (50-100ms)
Bulk Data Migration
Scenario: Moving large datasets between systems Why batching: Maximum throughput, latency not critical Example: Database CDC, ETL pipelines Config: Maximum batches (256 KB), high linger (100ms)
✕ When NOT to Use (or Use Minimal Batching)
Real-Time Alerting
Problem: Critical alerts delayed by batching Solution: linger.ms=0, small batches (16 KB) Example: Security alerts, system monitoring
Trading Systems
Problem: Milliseconds matter, batching adds latency Solution: No batching (linger.ms=0) or very small windows Example: High-frequency trading, order execution
Request-Response Patterns
Problem: User waiting for immediate response Solution: Minimal batching, sync sends Example: API calls, user-facing operations
Interview Application
Common Interview Question 1
Q: “How would you optimize a producer that’s sending 100,000 small messages per second, causing high CPU and network usage?”
Strong Answer:
“The issue is likely excessive network overhead from sending each message individually. I’d implement producer batching:
Diagnosis:
- Current: 100K messages × 1 KB = 100K network requests/sec
- Network overhead: ~50% of bandwidth wasted on headers
- CPU overhead: 100K serialize/send operations
Solution:
// Enable aggressive batching props.put(ProducerConfig.BATCH_SIZE_CONFIG, 65536); // 64 KB props.put(ProducerConfig.LINGER_MS_CONFIG, 20); // 20ms window props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");Result:
- Batching: 100K messages → ~2K batches (50x reduction)
- Compression: 64 KB → ~15 KB per batch (4x savings)
- Network: 98% reduction in requests
- CPU: 95% reduction in overhead
- Added latency: ~20ms (acceptable for most use cases)
Tradeoff: 20ms added latency vs 50x throughput improvement. For log/event streaming, this is optimal.”
Why this is good:
- Quantifies the problem
- Provides specific configuration
- Explains each parameter choice
- Analyzes tradeoffs explicitly
- Gives measurable results
Common Interview Question 2
Q: “Your Kafka producer is dropping messages under high load. How would you debug and fix this?”
Strong Answer:
“Message drops under load suggest buffer memory exhaustion. Here’s my approach:
Diagnosis Steps:
- Check JMX metric:
buffer-available-bytes→ Likely near 0- Check logs for
BufferExhaustedException- Check
max.block.mstimeout (default 60s)Root Cause Analysis:
- Batches accumulating faster than sender thread can send
- Possible causes:
- Network slowness (broker response time)
- Too small buffer.memory for traffic volume
- Inefficient batching (small batches = more sends)
Solutions (in order):
1. Increase buffer memory:
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 268435456); // 256 MB2. Optimize batching for throughput:
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 131072); // 128 KB props.put(ProducerConfig.LINGER_MS_CONFIG, 50); // Wait for fuller batches props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");3. Application-level backpressure:
producer.send(record, (metadata, exception) -> { if (exception instanceof BufferExhaustedException) { // Implement retry with exponential backoff // Or shed load (return 503 to clients) } });Result: Larger buffer + more efficient batching = 10x capacity improvement”
Why this is good:
- Systematic debugging approach
- Multiple solution layers
- Specific metrics to check
- Code examples
- Explains root cause clearly
Red Flags to Avoid
- ✕ Not understanding latency tradeoff of batching
- ✕ Setting linger.ms without understanding batch.size
- ✕ Not considering memory implications
- ✕ Ignoring compression benefits with batching
- ✕ Not knowing how to measure batching efficiency
Quick Self-Check
Before moving on, can you:
- Explain producer batching in 60 seconds?
- Draw the batching flow from send() to network?
- List all 4 batch trigger conditions?
- Explain the latency-throughput tradeoff?
- Calculate network savings from batching?
- Configure producer for high-throughput vs low-latency?
Related Content
Prerequisites
None - this is a foundational performance concept
Related Concepts
- Log-Based Storage - Batching works well with sequential writes
- Topic Partitioning - Batching is per-partition
Used In Systems
- Distributed Message Queues - Core performance technique
- Event-Driven Architectures - Essential for high throughput
Explained In Detail
- Kafka Producer Mechanics - Implementation details (35 minutes)
Next Recommended: Producer Acknowledgments - Understand reliability guarantees
Production signal