Model Mechanics
Softmax
Explore how logits become a probability distribution.
where z are the input scores and i is the class index
Logits to Probability
Adjust raw class scores and watch the distribution move.
Controls
Class 1exp(4.00) = 54.60
Score 4.00 · P 84.4%
Class 2exp(2.00) = 7.39
Score 2.00 · P 11.4%
Class 3exp(1.00) = 2.72
Score 1.00 · P 4.2%
Calculation
denominator = sum_j exp(z_j) = 64.705P(class 1) = exp(4.00) / 64.705 = 54.598 / 64.705 = 0.844P(class 2) = exp(2.00) / 64.705 = 7.389 / 64.705 = 0.114P(class 3) = exp(1.00) / 64.705 = 2.718 / 64.705 = 0.042sum P = 100.0%
Technical Notes
Notes carried over from the original visual, tuned for the new site.
Key Properties of Softmax
The softmax function converts raw scores into a probability distribution, ensuring all outputs are between 0 and 1 and sum to 1.
- All output probabilities are guaranteed to be positive because each score is exponentiated.
- The sum of all probabilities always equals 1, creating a valid distribution over the classes.
Score Sensitivity
Larger differences between input scores result in more extreme probability distributions. This lets a model express strong preferences when there are clear distinctions between classes.
Historical Terminology
The precise term is soft(arg)max, but softmax became the standard shorthand in machine learning frameworks and literature because it acts as a differentiable approximation of argmax.
Implementation note
Softmax is sensitive to relative differences: small logit changes can sharply shift the probability mass assigned to a class.