Model Mechanics

Softmax

Explore how logits become a probability distribution.

σ_{i} (z) = \frac{e^{z_{i}}}{\sum_{j = 1}^{K} e^{z_{j}}}

where z are the input scores and i is the class index

Logits to Probability

Adjust raw class scores and watch the distribution move.

Controls

Class 1exp(4.00) = 54.60

Score 4.00 · P 84.4%

Class 2exp(2.00) = 7.39

Score 2.00 · P 11.4%

Class 3exp(1.00) = 2.72

Score 1.00 · P 4.2%

Calculation

denominator = sum_j exp(z_j) = 64.705P(class 1) = exp(4.00) / 64.705 = 54.598 / 64.705 = 0.844P(class 2) = exp(2.00) / 64.705 = 7.389 / 64.705 = 0.114P(class 3) = exp(1.00) / 64.705 = 2.718 / 64.705 = 0.042sum P = 100.0%

Technical Notes

Notes carried over from the original visual, tuned for the new site.

Key Properties of Softmax

The softmax function converts raw scores into a probability distribution, ensuring all outputs are between 0 and 1 and sum to 1.

All output probabilities are guaranteed to be positive because each score is exponentiated.
The sum of all probabilities always equals 1, creating a valid distribution over the classes.

Score Sensitivity

Larger differences between input scores result in more extreme probability distributions. This lets a model express strong preferences when there are clear distinctions between classes.

Historical Terminology

The precise term is soft(arg)max, but softmax became the standard shorthand in machine learning frameworks and literature because it acts as a differentiable approximation of argmax.

Implementation note

Softmax is sensitive to relative differences: small logit changes can sharply shift the probability mass assigned to a class.