Softmax Function

Interactive visualization of the fundamental neural network activation function

\[ \sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^K e^{z_j}} \]
where z are the input scores and i is the class index
Raw Score (left axis)
Probability (right axis)

Technical Notes

Key Properties of Softmax

The softmax function converts raw scores into a probability distribution, ensuring all outputs are between 0 and 1 and sum to 1.

  • All output probabilities are guaranteed to be positive due to the exponential function, making them suitable for classification tasks.
  • The sum of all probabilities always equals 1, creating a valid probability distribution over all possible classes.

Score Sensitivity

Larger differences between input scores result in more extreme probability distributions. This property allows the model to express strong preferences when there are clear distinctions between classes.

Historical Terminology

While technically the term is "soft(arg)max", the shortened form "softmax" has become standard terminology in machine learning frameworks and literature. This naming reflects its role as a differentiable approximation of the argmax function.