[latexpage]

When implementing algorithms for inference and learning with probabilistic models, it commonly comes up that one needs to sample from a discrete distribution. That is, from a multinomial distribution with parameter $\boldsymbol{\pi}\in\mathbb{R}^K$, such that $\pi_k\geq 0$ and $\sum_{k}\pi_k=1$. A somewhat more common occurrence is that we have a $\boldsymbol{\phi}\in\mathbb{R}^K$ where $\phi_k\geq 0$, but we don’t know the normalization constant. That is, our $\boldsymbol{\phi}$ is only proportional to the multinomial parameter $\boldsymbol{\pi}$. We want to rapidly generate a variate according to $\boldsymbol{\pi}$, given $\boldsymbol{\pi}$, something easily done with (Matlab) code such as this (paraphrased from Tom Minka‘s Lightspeed Toolbox):

cdf = cumsum(phi);

samp_k = sum(cdf < rand()*cdf(end)) + 1;

This is nice and simple, but you'll notice that it has $\mathcal{O}(K)$ time complexity for setup (computing the CDF) and $\mathcal{O}(K)$ time complexity per sample. The per-sample complexity could probably be reduced to $\mathcal{O}(\log K)$ with a better data structure for finding the threshold. It turns out, however, that we can do better and get $\mathcal{O}(1)$ for the sampling, while still being $\mathcal{O}(K)$.

Continue reading "The Alias Method: Efficient Sampling with Many Discrete Outcomes"