I/D/E · Visual explainer

Log Compaction

Summary

How Kafka's log compaction turns a topic into a key-value table by keeping only the latest value per key.

Read This As

Which latest value should survive for each key?

Failure Trap
Forgetting tombstones, compaction lag, and consumers that read before cleanup catches up.
Decision Rule
Use compaction for keyed state topics, and design retention plus tombstones around recovery expectations.
Traditional: Delete Old Segments time Segment 1 7+ days DELETE Segment 2 8 days DELETE Segment 3 3 days Segment 4 1 day Active writing... cleanup.policy=delete retention.ms=7 days What if you only need the latest value per key? Compaction: Keep Latest Per Key Kafka Log (before compaction) K:A v=1 K:B v=1 K:A v=2 K:C v=1 K:B v=2 K:A v=3 superseded by newer values Only the latest value per key matters K:A → keep v=3 | K:B → keep v=2 | K:C → keep v=1 The Compaction Cleaner Closed Segment A:1 B:1 A:2 C:1 B:2 A:3 Cleaner Thread Scan compact Compacted Segment C:1 B:2 A:3 Background process Only closed segments 6 records → 3 records One per unique key Deleting Keys: Tombstones Before A:5 latest B:3 latest C:2 latest A:null tombstone compact After Compaction B:3 C:2 Key A removed entirely null value = delete the key Tombstones retained briefly (delete.retention.ms) to propagate deletion to consumers Log Becomes a Table Compacted Topic user:1 Alice user:2 Bob user:3 Carol = Key-Value Table Key Value user:1 Alice user:2 Bob user:3 Carol Use Cases User profiles • Configuration • CDC changelog Consumer offsets • KTable in Kafka Streams
1 / ?

Traditional: Delete Old Segments

By default, Kafka deletes log segments after a retention period (e.g., 7 days). This works for event streams where historical completeness matters.

But what if you only care about the latest value for each key?

  • cleanup.policy=delete (default)
  • Deletes by time or size
  • Loses historical values after retention

Compaction: Keep Latest Per Key

Log compaction retains only the most recent record for each key. If Key A has values v1, v2, v3 — after compaction, only v3 remains.

This turns the log into a changelog table.

  • cleanup.policy=compact
  • One record per unique key
  • Older values removed

The Compaction Cleaner

A background thread (the cleaner) periodically scans closed segments. It builds a new segment containing only the latest value per key.

The active segment is never compacted — only closed segments.

  • Background process, not blocking
  • Only closed segments compacted
  • Configurable cleaner threads

Deleting Keys: Tombstones

To delete a key, produce a record with that key and a null value. This is called a tombstone. After compaction, the key is removed entirely.

Tombstones are retained briefly to propagate the delete to consumers.

  • null value = tombstone
  • delete.retention.ms controls tombstone life
  • Key fully removed after propagation

Log Becomes a Table

After compaction, the topic is effectively a key-value table. Consumers can rebuild state by reading from offset 0.

Use cases: User profiles, configuration, CDC changelog, consumer offset storage.

  • Compacted topic = key-value store
  • Full state from offset 0
  • KTable in Kafka Streams

What's Next?

Now that you understand log compaction, explore related patterns: Write-Ahead Log (WAL) for durability, Kafka Topic Partitioning for message distribution, and Log-Based Storage for the underlying principles.