The Correct Birth/Death Jacobian for Mixture Models

Nick Foti · March 8, 2013

Reversible jump Markov Chain Monte Carlo (RJMCMC) [1] is an extension of the Metropolis-Hastings algorithm that allows sampling from a distribution over models with potentially different numbers of parameters. In this post we are interested in determining the number of components to use when modeling data with a mixture model. The number of components corresponds to the dimension of the space we are walking through. The point of this post is to clear up a common error seen in the literature involving computing a Jacobian that arises in the algorithm.

Unfortunately, simply adding and removing dimensions does not produce a Markov Chain with the correct stationary distribution. The key insight of RJMCMC [1] is dimension matching the models we are jumping between so that the number of dimensions is always the same. This means that we pad the model with fewer parameters with extra variables drawn from some distribution until the dimensions match and perform any necessary transformations of these variables to create the model we are interested in. An important aspect of RJMCMC is that the moves must be reversible, that is if one can move from one component to another, then with positive probability they can make the reverse move.

The simplest type of RJMCMC moves for mixture models are called birth and death moves. Birth moves consist of adding a component and the corresponding parameters and a death moves involves removing the component and the corresponding parameters. A death move is the reverse of a birth move and so the pair of moves will define a valid RJMCMC algorithm. I will focus on the case of a birth move and death moves can be derived similarly. See [1] for details of the algorithm, but we note that in order to compute the Metropolis-Hastings acceptance ratio the probabilities need to be computed on the same space. This requires transforming a probability distribution on one space into another and involves the Jacobian determinant of the parameters of one of the models with respect to those of the other.

For a birth move we introduce a new component and generate its parameters from the prior distribution. Say we are jumping from $K$ components to $K+1$ components. Complications arise when we generate the probability that this component is selected by observations ($w_k$ in the notation of [1]). First we generate a beta random variable for the new component, $w_*$, and rescale the existing component probabilities so that the new set of probabilities sums to $1$. The Jacobian of the transformation from the smaller model to the model with an extra component has $(1-w_*)$ on the diagonal for the $K$ original components and $1$'s on the diagonal for the new component probability and model parameters. Naively, it seems that the determinant is then $(1-w_*)^K$, and many papers and books have been published using this as the determinant. In fact, the original paper had this error [1], which was later noted in [2].

Since the original $w_k$'s formed a probability measure, the last component, $w_K$, is completely determined by all the others. Therefore, it does not show up as a variable that is differentiated with respect to in the Jacobian. The correct Jacobian is therefore $(1-w_*)^{K-1}$. This may seem obvious, but the mistake has been made enough in published papers and books that I think it's worth clearing up.

In practice this change does not seem to affect the results very much [2], though it could be significant for some models. Another issue with using the incorrect Jacobian is that technically the chain converges to the wrong stationary distribution. I think the moral to this post is that mistakes propagate when methods are used as presented without completely understanding them. Don't believe everything you read, and if you're going to use a method make sure you can derive all of the equations.$^*$.

There is much more to RJMCMC including what to actually do with the Jacobian determinant described above. The underlying theory is interesting as well, in particular the general RJMCMC algorithm explains how to do Metropolis-Hastings on general measures. Additionally, the types of moves can be much more clever (and complicated), for instance splitting and merging components. See [1] and [3] for detailed treatments. Both [4] and [5] contain very readable presentations as well (and show the correct Jacobian for birth and death moves).

[1] Richardson and Green, On Bayesian Analysis of Mixtures with an Unknown Number of Components, 1997.
[2] Richardson and Green, Corrigendum: On Bayesian analysis of mixtures with an unknown number of components, 1998.
[3] Handbook of Markov Chain Monte Carlo, Section 1.17.
[4] Marin and Robert, Bayesian Core.
[5] Robert and Casella, Monte Carlo Statistical Methods.

$^*$ As another quick example of mistakes propagating, one of the authors of a paper that used the wrong Jacobian has another paper where he used the reciprocal of the Metropolis-Hastings acceptance probability for his MCMC algorithm. I contacted him about this error and he pointed me to a paper on MCMC that presented the acceptance probability incorrectly. I tried to convince him that this was an error but because of this one reference with a typo I could not sway him.

Twitter, Facebook