[latexpage]It often comes up in neural networks, generalized linear models, topic models and many other probabilistic models that one wishes to parameterize a discrete distribution in terms of an unconstrained vector of numbers, i.e., a vector that is not confined to the simplex, might be negative, etc. A very common way to address this is to use the “softmax” transformation:
\begin{align*}
\pi_k &= \frac{\exp\{x_k\}}{\sum_{k’=1}^K\exp\{x_{k’}\}}
\end{align*}
where the $x_k$ are unconstrained in $\mathbb{R}$, but the $\pi_k$ live on the simplex, i.e., $\pi_k \geq 0$ and $\sum_{k}\pi_k=1$. The $x_k$ parameterize a discrete distribution (not uniquely) and we can generate data by performing the softmax transformation and then doing the usual thing to draw from a discrete distribution. Interestingly, it turns out that there is an alternative way to arrive at such discrete samples, that doesn’t actually require constructing the discrete distribution.
Continue reading “The Gumbel-Max Trick for Discrete Distributions”