Lamport Clock A logical clock that captures the happens-before relationship between events in distributed systems using simple integer counters
Concepts
Look up the idea, then keep reading.
These are the small pages that unlock the longer work: quorum, write-ahead logs, embeddings, circuit breakers, and the terms that otherwise interrupt your reading.
Good entry points
Start with foundations Consensus, clocks, consistency, and the ideas everything else leans on. Build the AI base layer Models, loss, embeddings, attention, and the math you actually use. Understand streaming Partitions, producers, offsets, and the failure modes behind event systems. Foundations
Core principles and building blocks
3 pages
Foundations
Core principles and building blocks
Logical Clocks Mechanisms for ordering events in distributed systems based on causality rather than physical time
Distributed Systems Basics The fundamental concepts of distributed computing: how multiple machines coordinate to appear as a single coherent system, navigating network partitions, failures, and the CAP theorem
Generative AI
Machine learning and AI engineering fundamentals
12 pages
Guided path
Generative AI
Machine learning and AI engineering fundamentals
What Is a Model? Foundation vocabulary for machine learning: parameters, weights, logits, training vs inference, and why neural networks work
1 Probability Basics Foundation for softmax, cross-entropy, temperature scaling, and sampling in AI systems
2 Math Intuitions Geometric intuitions for vectors, cosine similarity, dot products, and matrix multiplication in AI
3 Backpropagation How neural networks learn: gradients, chain rule, vanishing gradients, and residual connections
4 Loss Functions Reference for cross-entropy, MSE, perplexity, and contrastive loss in training and evaluation
5 Activation Functions ReLU, GELU, SwiGLU, softmax, and sigmoid: what they do and when to use them
6 ML Metrics Classification and retrieval metrics: precision, recall, F1, perplexity, MRR, and NDCG
7 Normalization LayerNorm, BatchNorm, RMSNorm: what they do, when to use them, and Pre-Norm vs Post-Norm
8 Regularization Dropout, weight decay, early stopping, and label smoothing to prevent overfitting
9 Optimization SGD, Adam, AdamW, learning rate schedules, warmup, and gradient clipping for training
10 Memory & Compute GPU memory, precision formats, quantization (INT4/INT8), and practical GPU selection for LLMs
11 Memory Systems for AI Session, short-term, long-term, and episodic memory for AI agents and chatbots
12 Messaging
Event streaming and communication
6 pages
Messaging
Event streaming and communication
Producer Acknowledgments Mechanisms by which message producers receive confirmation that their messages were successfully persisted, enabling reliability tradeoffs between latency and durability
Producer Batching How message producers batch records to achieve high throughput by amortizing network overhead and maximizing sequential I/O
Topic Partitioning How distributed systems divide data into partitions for parallel processing, ordering guarantees, and horizontal scalability
Offset Management How distributed messaging systems track consumer progress through partitions using offsets, enabling fault tolerance, exactly-once processing, and replay capabilities
Consumer Groups How multiple consumers coordinate to process partitions in parallel with fault tolerance, automatic rebalancing, and exactly-once guarantees
Exactly-Once Semantics How distributed messaging systems guarantee each message is processed exactly once, eliminating duplicates while ensuring atomicity across multiple operations
Storage
Data persistence and retrieval
4 pages
Storage
Data persistence and retrieval
Write-Ahead Log (WAL) A technique where changes are written to a durable log before being applied to the database, enabling crash recovery and replication in database systems
Log-Based Storage How distributed systems use append-only logs for durable, ordered, and high-throughput data storage with time-travel and replay capabilities
Sharding How databases horizontally partition data across multiple servers for scalability, using partition keys to distribute and route data efficiently
ACID Transactions The four guarantees that database transactions provide: Atomicity, Consistency, Isolation, and Durability—and how they enable reliable data operations
Patterns
Design patterns and architectures
22 pages
Patterns
Design patterns and architectures
Immutability Design principle where data structures cannot be modified after creation, simplifying distributed systems by eliminating update conflicts and race conditions
Heartbeat Periodic signals sent between nodes to indicate liveness, enabling failure detection in distributed systems
Checkpointing Periodically saving processing state to enable recovery from failures without reprocessing all data from the beginning
Failover Automatic switching to a backup system or replica when the primary fails, ensuring service continuity with minimal downtime
Quorum The minimum number of nodes in a distributed system that must agree on an operation for it to be considered successful, ensuring consistency despite failures
Idempotence Operations that produce the same result when applied multiple times, critical for reliable distributed systems with retries and duplicate message handling
Load Balancing Distributing incoming requests across multiple servers to optimize resource utilization, minimize latency, and prevent any single server from becoming a bottleneck
Rate Limiting Controlling the rate of requests to a service to prevent overload, ensure fair usage, and protect against abuse
Token Bucket A rate limiting algorithm that allows controlled bursts of traffic while enforcing an average rate limit over time
Sliding Window A rate limiting algorithm that smoothly enforces request limits by weighting the previous time window, preventing boundary burst problems
Virtual Nodes Multiple hash ring positions per physical node, improving load distribution in consistent hashing
Circuit Breaker A resilience pattern that prevents cascade failures by failing fast when a downstream service is unhealthy
Consistent Hashing A distributed hashing scheme that minimizes key redistribution when nodes are added or removed from a cluster
Eventual Consistency A consistency model where updates eventually propagate to all replicas, prioritizing availability over immediate consistency in distributed systems
Failure Detection Mechanisms to identify when nodes in a distributed system have failed, enabling recovery and fault tolerance
Gossip Protocol An epidemic-style protocol for disseminating information across a distributed cluster with logarithmic convergence
CAP Theorem The fundamental trade-off in distributed systems: during a network partition, you must choose between consistency and availability
Leader-Follower Replication How distributed systems achieve fault tolerance and high availability by replicating data from a leader node to multiple follower nodes
Replication How distributed systems copy data across multiple nodes to achieve high availability, fault tolerance, and geographic distribution—and the fundamental trade-offs involved
Event Sourcing An architectural pattern that stores all changes to application state as a sequence of events, enabling complete audit trails and time-travel capabilities
CQRS (Command Query Responsibility Segregation) An architectural pattern that separates read and write operations into distinct models, optimizing each for its specific use case
Consensus How distributed systems agree on a single value or state across multiple nodes, enabling coordination despite failures and network partitions
Observability
Monitoring and debugging
1 pages
Observability
Monitoring and debugging