TL;DR
Consumer groups enable multiple consumer instances to work together to process partitions from a topic in parallel. Each partition is assigned to exactly one consumer within a group, providing parallel processing while maintaining ordering guarantees. Automatic rebalancing handles failures and scaling.
Visual Overview
Topic: user-events (4 partitions) ┌──────────┬──────────┬──────────┬──────────┐ │ Part 0 │ Part 1 │ Part 2 │ Part 3 │ └──────────┴──────────┴──────────┴──────────┘ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ┌──────────┬──────────┬──────────┬──────────┐ │ Consumer │ Consumer │ Consumer │ Consumer │ │ A │ B │ C │ D │ └──────────┴──────────┴──────────┴──────────┘ Group: "analytics-processors" KEY GUARANTEES: ├── Each partition assigned to exactly ONE consumer in group ├── Each consumer can handle multiple partitions ├── Automatic rebalancing on member changes └── Fault tolerance through coordinator failover REBALANCING SCENARIOS: 1. Consumer joins: 4 partitions → 5 consumers (rebalance) 2. Consumer crashes: 4 partitions → 3 consumers (rebalance) 3. Partition added: New partition needs assignment (rebalance)
Core Explanation
What is a Consumer Group?
A consumer group is a logical collection of consumer instances that work together to consume messages from a topic. The group provides:
- Load distribution: Partitions spread across consumers
- Fault tolerance: Failed consumers automatically replaced
- Scaling: Add/remove consumers dynamically
- Coordination: Group coordinator manages partition assignments
Partition Assignment Guarantee
The Golden Rule:
Each partition is assigned to exactly one consumer within a consumer group at any given time.
VALID ASSIGNMENT (4 partitions, 3 consumers): Consumer A: [Partition 0, Partition 1] Consumer B: [Partition 2] Consumer C: [Partition 3] ✓ Each partition assigned exactly once INVALID ASSIGNMENT: Consumer A: [Partition 0] Consumer B: [Partition 0] ✕ Partition 0 assigned twice!
This guarantee ensures:
- No duplicate processing within a group
- Ordering maintained per partition
- Clear ownership of each partition
How Partition Assignment Works
Assignment Strategies:
// 1. RANGE STRATEGY (default)
// Assigns consecutive partitions to consumers
Topic: user-events (6 partitions)
Consumer A: [0, 1]
Consumer B: [2, 3]
Consumer C: [4, 5]
// Pro: Simple, predictable
// Con: Uneven if partition count doesn't divide evenly
// 2. ROUND-ROBIN STRATEGY
// Distributes partitions one-by-one in round-robin
Topic: user-events (6 partitions)
Consumer A: [0, 3]
Consumer B: [1, 4]
Consumer C: [2, 5]
// Pro: Even distribution
// Con: Less predictable, more partition movement on rebalance
// 3. STICKY STRATEGY
// Minimizes partition movement during rebalance
// Keeps existing assignments when possible
// Pro: Reduces rebalancing overhead
// Con: Slightly more complex
Configuration:
Properties props = new Properties();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "analytics-processors");
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
"org.apache.kafka.clients.consumer.RangeAssignor");
Group Coordinator and Rebalancing
Coordinator Selection:
Group ID: "analytics-processors" ↓ hash(groupId) % num_partitions(__consumer_offsets) ↓ Partition 23 in __consumer_offsets ↓ Broker 2 (leader of partition 23) ↓ Broker 2 becomes Group Coordinator
Rebalancing Protocol (Simplified):
REBALANCING FLOW: 1. TRIGGER EVENT ├── Consumer joins group ├── Consumer leaves/crashes ├── Consumer heartbeat timeout └── Partition count changes 2. COORDINATOR INITIATES REBALANCE ├── Sends REBALANCE_IN_PROGRESS to all consumers └── Consumers stop processing, commit offsets 3. JOIN GROUP PHASE ├── All consumers re-join group ├── Send their supported partition assignment strategies └── Coordinator collects member info 4. ASSIGNMENT PHASE ├── Coordinator runs assignment strategy ├── Calculates new partition assignments └── Sends assignments to consumers 5. RESUME PROCESSING └── Consumers start consuming from new assignments TOTAL REBALANCE TIME: ~500ms to several seconds
Scaling Patterns
Under-Subscribed (Fewer Consumers than Partitions):
4 Partitions, 2 Consumers: Consumer A: [P0, P1] Consumer B: [P2, P3] Throughput: 2x (2 parallel consumers) Utilization: 100% (all consumers busy)
Fully-Subscribed (Equal Consumers and Partitions):
4 Partitions, 4 Consumers: Consumer A: [P0] Consumer B: [P1] Consumer C: [P2] Consumer D: [P3] Throughput: 4x (4 parallel consumers) Utilization: 100% (optimal)
Over-Subscribed (More Consumers than Partitions):
4 Partitions, 6 Consumers: Consumer A: [P0] Consumer B: [P1] Consumer C: [P2] Consumer D: [P3] Consumer E: [] ⚠️ IDLE Consumer F: [] ⚠️ IDLE Throughput: 4x (limited by partitions) Utilization: 67% (2 consumers wasted) ✕ Cannot scale beyond partition count!
Multiple Consumer Groups
Independent Processing:
Topic: user-events (4 partitions) │ ├───► Group: "analytics" (processes all events) │ Consumer A: [P0, P1] │ Consumer B: [P2, P3] │ └───► Group: "fraud-detection" (also processes all events) Consumer X: [P0, P1, P2, P3] Each group independently consumes ALL messages. Groups do NOT affect each other.
Use Case - Multiple Processing Pipelines:
Topic: "user-actions" Group 1: "real-time-analytics" → Processes events for live dashboards Group 2: "ml-feature-pipeline" → Extracts features for ML models Group 3: "audit-logger" → Archives events for compliance All three groups consume the SAME messages independently.
Tradeoffs
Advantages:
- ✓ Horizontal scalability (add more consumers)
- ✓ Automatic fault tolerance (consumer failures handled)
- ✓ Load balancing across consumers
- ✓ Multiple independent processing pipelines (multiple groups)
Disadvantages:
- ✕ Rebalancing causes processing pause (stop-the-world)
- ✕ Cannot scale beyond partition count
- ✕ Partition assignment may be uneven
- ✕ Rebalancing overhead on frequent consumer changes
Real Systems Using This
Kafka (Apache)
- Implementation: Group coordinator per partition in
__consumer_offsets - Scale: Thousands of consumer groups processing trillions of messages
- Typical Setup: 10-50 consumers per group for high-throughput topics
Amazon Kinesis
- Implementation: Kinesis Client Library (KCL) provides similar consumer group semantics
- Scale: Auto-scaling consumer groups based on shard count
- Typical Setup: 1 worker per shard, auto-scaling with shard splits/merges
Apache Pulsar
- Implementation: Shared subscription model (similar to consumer groups)
- Scale: Automatic load rebalancing without stop-the-world pauses
- Typical Setup: Dynamic consumer scaling with minimal disruption
When to Use Consumer Groups
✓ Perfect Use Cases
High-Throughput Event Processing
Scenario: Processing 1M events/sec from user activity stream Solution: Consumer group with 100 consumers (10K events/sec each) Result: Linear scaling, automatic fault tolerance
Parallel Data Pipeline
Scenario: Real-time ETL from Kafka to data warehouse Solution: Consumer group with partitions = number of available cores Result: Maximize parallelism while maintaining ordering per partition
Multiple Processing Pipelines
Scenario: Same events need processing by analytics, ML, and audit systems Solution: Three separate consumer groups on same topic Result: Independent processing without interfering with each other
✕ When NOT to Use
Need Broadcast to All Consumers
Problem: Every consumer must receive ALL messages Issue: Consumer groups distribute messages (each gets subset) Alternative: Use separate consumer groups or pub-sub pattern
Very Low Latency Requirements
Problem: Sub-millisecond latency critical Issue: Rebalancing causes temporary processing pause Alternative: Single consumer or fixed partition assignment
More Consumers than Partitions Long-Term
Problem: Want to run 100 consumers with only 10 partitions Issue: 90 consumers will be idle, wasting resources Alternative: Increase partition count or reduce consumers
Interview Application
Common Interview Question 1
Q: “You have a topic with 10 partitions. If you deploy 15 consumers in the same consumer group, what happens?”
Strong Answer:
“Only 10 consumers will be active - one per partition. The remaining 5 consumers will be idle since each partition can only be assigned to one consumer in a group. This is inefficient. To utilize all 15 consumers, I’d either increase the partition count to 15+, or split the workload across multiple topics. If scaling further is anticipated, I’d over-provision partitions upfront since changing partition count requires topic recreation.”
Why this is good:
- Shows understanding of partition assignment constraint
- Identifies the inefficiency
- Provides multiple solutions
- Considers future scaling
Common Interview Question 2
Q: “What happens during a consumer group rebalance? How does it affect processing?”
Strong Answer:
“Rebalancing occurs when consumers join, leave, or crash. The process:
- Coordinator detects the change (heartbeat timeout or explicit notification)
- Sends REBALANCE_IN_PROGRESS to all group members
- Consumers stop processing and commit their offsets
- All consumers re-join the group
- Coordinator calculates new partition assignments using the configured strategy
- Consumers receive new assignments and resume processing
Impact: Processing pauses for ~500ms to several seconds. In production, we minimize rebalances by:
- Using static membership (Kafka 2.3+) to avoid rebalances on restarts
- Tuning
session.timeout.msandheartbeat.interval.ms- Using sticky assignor to minimize partition movement
- Graceful shutdowns with proper leave group notifications”
Why this is good:
- Detailed step-by-step understanding
- Quantifies the impact
- Shows production awareness
- Provides optimization strategies
Red Flags to Avoid
- ✕ Confusing consumer groups with partition replicas
- ✕ Claiming you can assign same partition to multiple consumers in one group
- ✕ Not knowing about rebalancing and its impact
- ✕ Forgetting that consumer count cannot exceed partition count for effectiveness
Quick Self-Check
Before moving on, can you:
- Explain consumer groups in 60 seconds?
- Draw a diagram showing partition-to-consumer assignment?
- Explain what triggers a rebalance?
- Calculate optimal consumer count given partition count?
- Identify when to use multiple consumer groups?
- Explain the partition assignment guarantee?
Related Content
See It In Action
- Consumer Group Rebalancing - ~100 second animated visual showing partition redistribution
- Kafka Topic Partitioning - ~100 second animated visual explanation of partitioning and consumer assignment
Prerequisites
- Topic Partitioning - Understanding partitions is essential for consumer groups
Related Concepts
- Offset Management - How consumers track their position
- Load Balancing - Distribution strategies
- Sharding - Similar concept for databases
Used In Systems
- Real-Time Analytics Pipelines - Consumer groups for parallel processing
- Event-Driven Microservices - Multiple consumer groups per service
Explained In Detail
- Kafka Architecture - Consumer Groups & Rebalancing section (30 minutes)
- Deep dive into rebalancing protocols, partition assignment strategies, and coordinator mechanics
Next Recommended: Offset Management - Learn how consumers track their position in partitions
Production signal