Consumer Groups | Concepts

TL;DR

Consumer groups enable multiple consumer instances to work together to process partitions from a topic in parallel. Each partition is assigned to exactly one consumer within a group, providing parallel processing while maintaining ordering guarantees. Automatic rebalancing handles failures and scaling.

Visual Overview

Consumer Group Architecture

Topic: user-events (4 partitions)
┌──────────┬──────────┬──────────┬──────────┐
│  Part 0  │  Part 1  │  Part 2  │  Part 3  │
└───────────────────────────────────────────┘
    │          │          │          │
    │          │          │          │
    ▼          ▼          ▼          ▼
┌──────────┬──────────┬──────────┬──────────┐
│ Consumer │ Consumer │ Consumer │ Consumer │
│    A     │    B     │    C     │    D     │
└───────────────────────────────────────────┘

Group: "analytics-processors"

KEY GUARANTEES:
├── Each partition assigned to exactly ONE consumer in group
├── Each consumer can handle multiple partitions
├── Automatic rebalancing on member changes
└── Fault tolerance through coordinator failover

REBALANCING SCENARIOS:

1. Consumer joins: 4 partitions → 5 consumers (rebalance)
2. Consumer crashes: 4 partitions → 3 consumers (rebalance)
3. Partition added: New partition needs assignment (rebalance)

Core Explanation

What is a Consumer Group?

A consumer group is a logical collection of consumer instances that work together to consume messages from a topic. The group provides:

Load distribution: Partitions spread across consumers
Fault tolerance: Failed consumers automatically replaced
Scaling: Add/remove consumers dynamically
Coordination: Group coordinator manages partition assignments

Partition Assignment Guarantee

The Golden Rule:

Each partition is assigned to exactly one consumer within a consumer group at any given time.

Partition Assignment Guarantee

VALID ASSIGNMENT (4 partitions, 3 consumers):
Consumer A: [Partition 0, Partition 1]
Consumer B: [Partition 2]
Consumer C: [Partition 3]
✓ Each partition assigned exactly once

INVALID ASSIGNMENT:
Consumer A: [Partition 0]
Consumer B: [Partition 0] ✕ Partition 0 assigned twice!

This guarantee ensures:

No duplicate processing within a group
Ordering maintained per partition
Clear ownership of each partition

How Partition Assignment Works

Assignment Strategies:

// 1. RANGE STRATEGY (default)
// Assigns consecutive partitions to consumers
Topic: user-events (6 partitions)
Consumer A: [0, 1]
Consumer B: [2, 3]
Consumer C: [4, 5]
// Pro: Simple, predictable
// Con: Uneven if partition count doesn't divide evenly

// 2. ROUND-ROBIN STRATEGY
// Distributes partitions one-by-one in round-robin
Topic: user-events (6 partitions)
Consumer A: [0, 3]
Consumer B: [1, 4]
Consumer C: [2, 5]
// Pro: Even distribution
// Con: Less predictable, more partition movement on rebalance

// 3. STICKY STRATEGY
// Minimizes partition movement during rebalance
// Keeps existing assignments when possible
// Pro: Reduces rebalancing overhead
// Con: Slightly more complex

Configuration:

Properties props = new Properties();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "analytics-processors");
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
    "org.apache.kafka.clients.consumer.RangeAssignor");

Group Coordinator and Rebalancing

Coordinator Selection:

Coordinator Selection

Group ID: "analytics-processors"
              ↓
  hash(groupId) % num_partitions(__consumer_offsets)
              ↓
  Partition 23 in __consumer_offsets
              ↓
  Broker 2 (leader of partition 23)
              ↓
  Broker 2 becomes Group Coordinator

Rebalancing Protocol (Simplified):

Rebalancing Protocol

REBALANCING FLOW:

1. TRIGGER EVENT
 ├── Consumer joins group
 ├── Consumer leaves/crashes
 ├── Consumer heartbeat timeout
 └── Partition count changes

2. COORDINATOR INITIATES REBALANCE
 ├── Sends REBALANCE_IN_PROGRESS to all consumers
 └── Consumers stop processing, commit offsets

3. JOIN GROUP PHASE
 ├── All consumers re-join group
 ├── Send their supported partition assignment strategies
 └── Coordinator collects member info

4. ASSIGNMENT PHASE
 ├── Coordinator runs assignment strategy
 ├── Calculates new partition assignments
 └── Sends assignments to consumers

5. RESUME PROCESSING
 └── Consumers start consuming from new assignments

TOTAL REBALANCE TIME: ~500ms to several seconds

Scaling Patterns

Under-Subscribed (Fewer Consumers than Partitions):

Under-Subscribed Pattern

4 Partitions, 2 Consumers:
Consumer A: [P0, P1]
Consumer B: [P2, P3]

Throughput: 2x (2 parallel consumers)
Utilization: 100% (all consumers busy)

Fully-Subscribed (Equal Consumers and Partitions):

Fully-Subscribed Pattern

4 Partitions, 4 Consumers:
Consumer A: [P0]
Consumer B: [P1]
Consumer C: [P2]
Consumer D: [P3]

Throughput: 4x (4 parallel consumers)
Utilization: 100% (optimal)

Over-Subscribed (More Consumers than Partitions):

Over-Subscribed Pattern

4 Partitions, 6 Consumers:
Consumer A: [P0]
Consumer B: [P1]
Consumer C: [P2]
Consumer D: [P3]
Consumer E: []  ⚠️ IDLE
Consumer F: []  ⚠️ IDLE

Throughput: 4x (limited by partitions)
Utilization: 67% (2 consumers wasted)

✕ Cannot scale beyond partition count!

Multiple Consumer Groups

Independent Processing:

Multiple Consumer Groups

Topic: user-events (4 partitions)
      │
      ├───► Group: "analytics" (processes all events)
      │     Consumer A: [P0, P1]
      │     Consumer B: [P2, P3]
      │
      └───► Group: "fraud-detection" (also processes all events)
            Consumer X: [P0, P1, P2, P3]

Each group independently consumes ALL messages.
Groups do NOT affect each other.

Use Case - Multiple Processing Pipelines:

Multiple Processing Pipelines

Topic: "user-actions"

Group 1: "real-time-analytics"
→ Processes events for live dashboards

Group 2: "ml-feature-pipeline"
→ Extracts features for ML models

Group 3: "audit-logger"
→ Archives events for compliance

All three groups consume the SAME messages independently.

Tradeoffs

Advantages:

✓ Horizontal scalability (add more consumers)
✓ Automatic fault tolerance (consumer failures handled)
✓ Load balancing across consumers
✓ Multiple independent processing pipelines (multiple groups)

Disadvantages:

✕ Rebalancing causes processing pause (stop-the-world)
✕ Cannot scale beyond partition count
✕ Partition assignment may be uneven
✕ Rebalancing overhead on frequent consumer changes

Real Systems Using This

Kafka (Apache)

Implementation: Group coordinator per partition in __consumer_offsets
Scale: Thousands of consumer groups processing trillions of messages
Typical Setup: 10-50 consumers per group for high-throughput topics

Amazon Kinesis

Implementation: Kinesis Client Library (KCL) provides similar consumer group semantics
Scale: Auto-scaling consumer groups based on shard count
Typical Setup: 1 worker per shard, auto-scaling with shard splits/merges

Apache Pulsar

Implementation: Shared subscription model (similar to consumer groups)
Scale: Automatic load rebalancing without stop-the-world pauses
Typical Setup: Dynamic consumer scaling with minimal disruption

When to Use Consumer Groups

✓ Perfect Use Cases

Use Case	Scenario	Solution	Result
High-Throughput Event Processing	Processing 1M events/sec from user activity stream	Consumer group with 100 consumers (10K events/sec each)	Linear scaling, automatic fault tolerance
Parallel Data Pipeline	Real-time ETL from Kafka to data warehouse	Consumer group with partitions = number of available cores	Maximize parallelism while maintaining ordering per partition
Multiple Processing Pipelines	Same events need processing by analytics, ML, and audit systems	Three separate consumer groups on same topic	Independent processing without interfering with each other

✕ When NOT to Use

Anti-Pattern	Problem	Issue	Alternative
Need Broadcast to All Consumers	Every consumer must receive ALL messages	Consumer groups distribute messages (each gets subset)	Use separate consumer groups or pub-sub pattern
Very Low Latency Requirements	Sub-millisecond latency critical	Rebalancing causes temporary processing pause	Single consumer or fixed partition assignment
More Consumers than Partitions Long-Term	Want to run 100 consumers with only 10 partitions	90 consumers will be idle, wasting resources	Increase partition count or reduce consumers

Interview Application

Common Interview Question 1

Q: “You have a topic with 10 partitions. If you deploy 15 consumers in the same consumer group, what happens?”

Strong Answer:

“Only 10 consumers will be active - one per partition. The remaining 5 consumers will be idle since each partition can only be assigned to one consumer in a group. This is inefficient. To utilize all 15 consumers, I’d either increase the partition count to 15+, or split the workload across multiple topics. If scaling further is anticipated, I’d over-provision partitions upfront since changing partition count requires topic recreation.”

Why this is good:

Shows understanding of partition assignment constraint
Identifies the inefficiency
Provides multiple solutions
Considers future scaling

Common Interview Question 2

Q: “What happens during a consumer group rebalance? How does it affect processing?”

Strong Answer:

“Rebalancing occurs when consumers join, leave, or crash. The process:

Coordinator detects the change (heartbeat timeout or explicit notification)

Sends REBALANCE_IN_PROGRESS to all group members

Consumers stop processing and commit their offsets

All consumers re-join the group

Coordinator calculates new partition assignments using the configured strategy

Consumers receive new assignments and resume processing

Impact: Processing pauses for ~500ms to several seconds. In production, we minimize rebalances by:

Using static membership (Kafka 2.3+) to avoid rebalances on restarts

Tuning session.timeout.ms and heartbeat.interval.ms

Using sticky assignor to minimize partition movement

Graceful shutdowns with proper leave group notifications”

Why this is good:

Detailed step-by-step understanding
Quantifies the impact
Shows production awareness
Provides optimization strategies

Red Flags to Avoid

✕ Confusing consumer groups with partition replicas
✕ Claiming you can assign same partition to multiple consumers in one group
✕ Not knowing about rebalancing and its impact
✕ Forgetting that consumer count cannot exceed partition count for effectiveness

Quick Self-Check

Before moving on, can you:

Explain consumer groups in 60 seconds?
Draw a diagram showing partition-to-consumer assignment?
Explain what triggers a rebalance?
Calculate optimal consumer count given partition count?
Identify when to use multiple consumer groups?
Explain the partition assignment guarantee?

See It In Action

Consumer Group Rebalancing - ~100 second animated visual showing partition redistribution
Kafka Topic Partitioning - ~100 second animated visual explanation of partitioning and consumer assignment

Prerequisites

Topic Partitioning - Understanding partitions is essential for consumer groups

Offset Management - How consumers track their position
Load Balancing - Distribution strategies
Sharding - Similar concept for databases

Used In Systems

Real-Time Analytics Pipelines - Consumer groups for parallel processing
Event-Driven Microservices - Multiple consumer groups per service

Explained In Detail

Kafka Architecture - Consumer Groups & Rebalancing section (30 minutes)
Deep dive into rebalancing protocols, partition assignment strategies, and coordinator mechanics

Next Recommended: Offset Management - Learn how consumers track their position in partitions

TL;DR

Visual Overview

Core Explanation

What is a Consumer Group?

Partition Assignment Guarantee

How Partition Assignment Works

Group Coordinator and Rebalancing

Scaling Patterns

Multiple Consumer Groups

Tradeoffs

Real Systems Using This

Kafka (Apache)

Amazon Kinesis

Apache Pulsar

When to Use Consumer Groups

✓ Perfect Use Cases

✕ When NOT to Use

Interview Application

Common Interview Question 1

Common Interview Question 2

Red Flags to Avoid

Quick Self-Check

Related Content

See It In Action

Prerequisites

Related Concepts

Used In Systems

Explained In Detail

Why this concept matters