Interactive visualization of the fundamental neural network activation function
The softmax function converts raw scores into a probability distribution, ensuring all outputs are between 0 and 1 and sum to 1.
Larger differences between input scores result in more extreme probability distributions. This property allows the model to express strong preferences when there are clear distinctions between classes.
While technically the term is "soft(arg)max", the shortened form "softmax" has become standard terminology in machine learning frameworks and literature. This naming reflects its role as a differentiable approximation of the argmax function.