Pseudo-marginal MCMC

Robert NishiharaComputation, Probability, Statistics

This post gives a brief introduction to the pseudo-marginal approach to MCMC. A very nice explanation, with examples, is available here. Frequently, we are given a density function \pi({\bf x}), with {\bf x} \in \mathcal X, and we use Markov chain Monte Carlo (MCMC) to generate samples from the corresponding probability distribution. For simplicity, suppose we are performing Metropolis-Hastings with a spherical proposal distribution. Then, we move from the current state {\bf x} to a proposed state {\bf x}' with probability \min(1, \pi({\bf x}')/\pi({\bf x})) .

But what if we cannot evaluate \pi({\bf x}) exactly? Such a situation might arise if we are given a joint density function \pi({\bf x}, {\bf z}), with {\bf z} \in \mathcal Z, and we must marginalize out {\bf z} in order to compute \pi({\bf x}). In this situation, we may only be able to approximate

    \[ \pi({\bf x}) = \int \pi({\bf x},{\bf z}) \, \mathrm d{\bf z} ,\]

for instance with importance sampling. If we draw i.i.d. variables \{{\bf z}_1, \ldots, {\bf z}_N\} from the distribution the density function q({\bf z}), then our importance sampling estimate will be

    \[ \tilde{\pi}({\bf x}) = \frac{1}{N} \sum_{n=1}^{N} \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)} . \]

What happens if we go ahead and run Metropolis-Hastings on the estimated density function \tilde{\pi}({\bf x})? If we generate new variables \{{\bf z}_1, \ldots, {\bf z}_N\} once for each proposal, then we can view this as performing a random walk through \mathcal X \times \mathcal Z^N. To be clear, if we are at the current state ({\bf x},{\bf Z}) = ({\bf x}, {\bf z}_1, \ldots, {\bf z}_N), we propose the state ({\bf x}',{\bf Z}') = ({\bf x'}, {\bf z}_1', \ldots, {\bf z}_N') by drawing {\bf x'} from the original spherical proposal distribution centered on {\bf x} and by drawing each {\bf z}_n' from q({\bf z}). We then accept the proposed state with probability

    \begin{eqnarray*} \frac{\sum_n \frac{\pi({\bf x}', {\bf z}_n')}{q({\bf z}_n')}}{\sum_n \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)}} & = & \frac{\left(\sum_n \frac{\pi({\bf x}', {\bf z}_n') }{q({\bf z}_n')}\right)\prod_n q({\bf z}_n')}{\left(\sum_n \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)}\right)\prod_n q({\bf z}_n)} \cdot \frac{\prod_n q({\bf z}_n)}{\prod_n q({\bf z}_n')} \end{eqnarray*}

Since the ratio of transition probabilities is given by

    \[ \frac{P[({\bf x}',{\bf Z}') \to ({\bf x},{\bf Z})]}{P[({\bf x},{\bf Z}) \to ({\bf x}',{\bf Z}')]} = \frac{\prod_n q({\bf z}_n)}{\prod_n q({\bf z}_n')} , \]

we can view this approximate Metropolis-Hastings algorithm on \mathcal X as an exact Metropolis-Hastings algorithm on \mathcal X \times \mathcal Z^N with stationary distribution given by

    \[ p({\bf x}, {\bf Z}) \propto \left( \sum_n \frac{\pi({\bf x}, {\bf z}_n) }{q({\bf z}_n)}\right) \prod_n q({\bf z}_n) .\]

For each x \in \mathcal X, define the function w_x \colon \mathcal Z^N \to \mathbb R by

    \[ w_{{\bf x}}({\bf z}_1, \ldots, {\bf z}_N) = \frac{\frac{1}{N}\sum_n \frac{\pi({\bf x}, {\bf z}_n)}{q({\bf z}_n)}}{\pi({\bf x})} .\]

This is essentially the random variable \tilde{\pi}({\bf x})/\pi({\bf x}). It follows that

    \[ p({\bf x}, {\bf Z}) \propto \pi({\bf x}) \, w_{{\bf x}}({\bf z}_1, \ldots, {\bf z}_N) \, \prod_n q({\bf z}_n) .\]

But

(1)   \begin{equation*} \int p({\bf x}, {\bf Z}) \, \mathrm d{\bf Z} \propto \pi({\bf x}) \, \mathbb E[w_{{\bf x}}]  = \pi({\bf x}) . \end{equation*}

Indeed,

    \begin{eqnarray*} \mathbb E[w_{{\bf x}}] & = & \frac{1}{N} \sum_{n = 1}^N \int \frac{\pi({\bf x}, {\bf z}_n)}{\pi({\bf x}) \, q({\bf z}_n)} \prod_{m=1}^N q({\bf z}_m) \, \mathrm d{\bf Z} \\ & = & \frac{1}{N} \sum_{n = 1}^N \int \pi({\bf z}_n \,|\, {\bf x})  \, \mathrm d{\bf z}_n \\ & = & 1. \end{eqnarray*}

The important part is that \mathbb E[w_{{\bf x}}] is independent of {\bf x}. Therefore, equation (1) tells us that the stationary distribution of our Markov chain has the desired marginal distribution. In other words, we can run Metropolis-Hastings on the approximation \tilde{\pi}({\bf x}) of the true density function and still get the correct outcome.