Skip to content

What Is a Model?

Foundation vocabulary for machine learning: parameters, weights, logits, training vs inference, and why neural networks work

TL;DR

A machine learning model is a mathematical function that maps inputs to outputs, with learnable parameters that are adjusted during training. Understanding parameters, logits, training vs inference, and the bias-variance tradeoff is essential vocabulary for any AI engineering work.

Visual Overview

Model As Function

Parameters, Weights, and Biases

Parameters are the learnable values inside a model.

Parameters and Scale

Logits

Logits are raw, unnormalized scores output by a model before converting to probabilities.

Logits to Probabilities

Why logits matter:

  • LLMs output logits over vocabulary (50,000+ scores)
  • Temperature and sampling operate on logits
  • Understanding logits helps debug generation issues

Deterministic vs Probabilistic vs Statistical

Three Types of Systems

ML models are statistical systems that often use probabilistic methods:

  • They learn from data (statistical)
  • They may sample from distributions (probabilistic)
  • Given same input + same random seed, they’re deterministic

LLMs with temperature > 0 are probabilistic. With temperature = 0, they’re deterministic.


Training vs Inference

Training vs Inference

You will mostly do inference. Training LLMs requires massive compute. Fine-tuning is more accessible but still expensive.


Why Neural Networks Work

Neural networks are function approximators. Given enough parameters, they can learn any pattern in data.

Why Neural Networks Work

The Bias-Variance Tradeoff

Two failure modes when learning from data:

Bias-Variance Tradeoff

Vocabulary Reference

TermDefinition
ModelMathematical function mapping inputs to outputs
ParametersLearnable values inside the model (weights + biases)
WeightsParameters in connections between neurons
BiasOffset parameter added to weighted sum
FeaturesMeasurable properties of input data
LogitsRaw, unnormalized output scores
SoftmaxConverts logits to probabilities (sum to 1)
TrainingAdjusting parameters to minimize error
InferenceUsing trained model to make predictions
LossMeasure of how wrong predictions are
GradientDirection to adjust parameters to reduce loss
EpochOne pass through entire training dataset

When This Matters

SituationWhat to know
Discussing model sizeParameters = capacity, larger = more memory
Debugging generationTemperature affects logit sampling
Understanding trainingForward pass -> loss -> backward pass -> update
Production deploymentInference only, no training overhead
Model selectionBias-variance tradeoff guides complexity choice
Interview Notes
💼80% of ML interviews
Interview Relevance
80% of ML interviews
🏭Every ML system
Production Impact
Powers systems at Every ML system
Foundation for all AI work
Performance
Foundation for all AI work query improvement