Concepts | Intentional / Deliberate / Engineering

Concepts

Small ideas, arranged by the work they unlock.

These are the short reference pages behind the longer essays and explainers: coordination, storage, messaging, reliability, AI, and the patterns that make systems easier to reason about.

Good entry points

Start with foundations Agreement, ordering, partitions, and failure detection. Follow failure Move through detection, isolation, overload, and recovery. Trace AI systems Connect model behavior to memory, metrics, and inference limits.

Browse

Choose a view

Follow curated paths when you want sequence. Switch to categories when you already know the area you need.

Foundations and coordination

Start with the distributed-systems primitives behind agreement, ordering, partitions, and failure detection.

8 pages Path

Distributed Systems Basics The fundamental concepts of distributed computing: how multiple machines coordinate to appear as a single coherent system, navigating network partitions, failures, and the CAP theorem

Intermediate Foundations 15 min read

CAP Theorem The fundamental trade-off in distributed systems: during a network partition, you must choose between consistency and availability

Intermediate Patterns 10 min read

Logical Clocks Mechanisms for ordering events in distributed systems based on causality rather than physical time

Intermediate Foundations 9 min read

Lamport Clock A logical clock that captures the happens-before relationship between events in distributed systems using simple integer counters

Intermediate Foundations 9 min read

Quorum The minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful, ensuring consistency despite failures

Intermediate Patterns 7 min read

Consensus How distributed systems agree on a single value or state across multiple nodes, enabling coordination despite failures and network partitions

Advanced Patterns 12 min read

Gossip Protocol An epidemic-style protocol for disseminating information across a distributed cluster with logarithmic convergence

Intermediate Patterns 9 min read

Failure Detection Mechanisms to identify when nodes in a distributed system have failed, enabling recovery and fault tolerance

Intermediate Patterns 9 min read

Data, storage, and distribution

Follow state through durability, replication, consistency, partitioning, and routing.

8 pages Path

ACID Transactions The four guarantees that database transactions provide: Atomicity, Consistency, Isolation, and Durability—and how they enable reliable data operations

Intermediate Storage 12 min read

Write-Ahead Log (WAL) A technique where changes are written to a durable log before being applied to the database, enabling crash recovery and replication in database systems

Intermediate Storage 8 min read

Log-Based Storage How distributed systems use append-only logs for durable, ordered, and high-throughput data storage with time-travel and replay capabilities

Intermediate Storage 10 min read

Replication How distributed systems copy data across multiple nodes to achieve high availability, fault tolerance, and geographic distribution—and the fundamental trade-offs involved

Intermediate Patterns 12 min read

Leader-Follower Replication How distributed systems achieve fault tolerance and high availability by replicating data from a leader node to multiple follower nodes

Intermediate Patterns 12 min read

Eventual Consistency A consistency model where updates eventually propagate to all replicas, prioritizing availability over immediate consistency in distributed systems

Intermediate Patterns 9 min read

Sharding How databases horizontally partition data across multiple servers for scalability, using partition keys to distribute and route data efficiently

Intermediate Storage 10 min read

Consistent Hashing A distributed hashing scheme that minimizes key redistribution when nodes are added or removed from a cluster

Intermediate Patterns 9 min read

Streams and replay

Build the mental model for ordered logs, consumer progress, replay, and processing guarantees.

8 pages Path

Topic Partitioning How distributed systems divide data into partitions for parallel processing, ordering guarantees, and horizontal scalability

Intermediate Messaging 8 min read

Producer Acknowledgments Mechanisms by which message producers receive confirmation that their messages were successfully persisted, enabling reliability tradeoffs between latency and durability

Intermediate Messaging 7 min read

Producer Batching How message producers batch records to achieve high throughput by amortizing network overhead and maximizing sequential I/O

Intermediate Messaging 8 min read

Consumer Groups How multiple consumers coordinate to process partitions in parallel with fault tolerance, automatic rebalancing, and exactly-once guarantees

Intermediate Messaging 10 min read

Offset Management How distributed messaging systems track consumer progress through partitions using offsets, enabling fault tolerance, exactly-once processing, and replay capabilities

Intermediate Messaging 9 min read

Exactly-Once Semantics How distributed messaging systems guarantee each message is processed exactly once, eliminating duplicates while ensuring atomicity across multiple operations

Advanced Messaging 12 min read

Checkpointing Periodically saving processing state to enable recovery from failures without reprocessing all data from the beginning

Intermediate Patterns 7 min read

Event Sourcing An architectural pattern that stores all changes to application state as a sequence of events, enabling complete audit trails and time-travel capabilities

Advanced Patterns 10 min read

Reliability and traffic

Move through liveness signals, failover, overload protection, balancing, and rate-control algorithms.

8 pages Path

Circuit Breaker A resilience pattern that prevents cascade failures by failing fast when a downstream service is unhealthy

Intermediate Patterns 9 min read

Heartbeat Periodic signals sent between nodes to indicate liveness, enabling failure detection in distributed systems

Beginner Patterns 8 min read

Health Checks Failure detection mechanisms in distributed systems: how to determine if a node is alive, dead, or just slow, enabling automatic failover and self-healing systems

Intermediate Observability 10 min read

Failover Automatic switching to a backup system or replica when the primary fails, ensuring service continuity with minimal downtime

Intermediate Patterns 7 min read

Load Balancing Distributing incoming requests across multiple servers to optimize resource utilization, minimize latency, and prevent any single server from becoming a bottleneck

Intermediate Patterns 8 min read

Rate Limiting Controlling the rate of requests to a service to prevent overload, ensure fair usage, and protect against abuse

Intermediate Patterns 8 min read

Token Bucket A rate limiting algorithm that allows controlled bursts of traffic while enforcing an average rate limit over time

Intermediate Patterns 8 min read

Sliding Window A rate limiting algorithm that smoothly enforces request limits by weighting the previous time window, preventing boundary burst problems

Intermediate Patterns 8 min read

AI and model fundamentals

Build the model vocabulary, math intuition, training loop, and evaluation basics.

10 pages Path

What Is a Model? Foundation vocabulary for machine learning: parameters, weights, logits, training vs inference, and why neural networks work

Beginner Gen AI 12 min read

Probability Basics Foundation for softmax, cross-entropy, temperature scaling, and sampling in AI systems

Beginner Gen AI 12 min read

Math Intuitions Geometric intuitions for vectors, cosine similarity, dot products, and matrix multiplication in AI

Beginner Gen AI 15 min read

Loss Functions Reference for cross-entropy, MSE, perplexity, and contrastive loss in training and evaluation

Intermediate Gen AI 12 min read

Backpropagation How neural networks learn: gradients, chain rule, vanishing gradients, and residual connections

Intermediate Gen AI 15 min read

Activation Functions ReLU, GELU, SwiGLU, softmax, and sigmoid: what they do and when to use them

Intermediate Gen AI 10 min read

ML Metrics Classification and retrieval metrics: precision, recall, F1, perplexity, MRR, and NDCG

Intermediate Gen AI 12 min read

Normalization LayerNorm, BatchNorm, RMSNorm: what they do, when to use them, and Pre-Norm vs Post-Norm

Intermediate Gen AI 12 min read

Regularization Dropout, weight decay, early stopping, and label smoothing to prevent overfitting

Intermediate Gen AI 10 min read

Optimization SGD, Adam, AdamW, learning rate schedules, warmup, and gradient clipping for training

Intermediate Gen AI 12 min read

State and architecture patterns

Close the loop with state-shaping patterns, idempotent operations, virtual distribution, and AI memory constraints.

6 pages Path

CQRS (Command Query Responsibility Segregation) An architectural pattern that separates read and write operations into distinct models, optimizing each for its specific use case

Advanced Patterns 11 min read

Idempotence Operations that produce the same result when applied multiple times, critical for reliable distributed systems with retries and duplicate message handling

Intermediate Patterns 8 min read

Immutability Design principle where data structures cannot be modified after creation, simplifying distributed systems by eliminating update conflicts and race conditions

Beginner Patterns 6 min read

Virtual Nodes Multiple hash ring positions per physical node, improving load distribution in consistent hashing

Intermediate Patterns 8 min read

Memory & Compute GPU memory, precision formats, quantization (INT4/INT8), and practical GPU selection for LLMs

Intermediate Gen AI 12 min read

Memory Systems for AI Session, short-term, long-term, and episodic memory for AI agents and chatbots

Intermediate Gen AI 15 min read