# Pseudo-marginal MCMC

Robert Nishihara · March 31, 2013

This post gives a brief introduction to the pseudo-marginal approach to MCMC. A very nice explanation, with examples, is available here. Frequently, we are given a density function $\pi(\mathbf{x})$, with $\mathbf{x} \in \mathcal X$, and we use Markov chain Monte Carlo (MCMC) to generate samples from the corresponding probability distribution. For simplicity, suppose we are performing Metropolis-Hastings with a spherical proposal distribution. Then, we move from the current state $\mathbf{x}$ to a proposed state $\mathbf{x}'$ with probability $\min(1, \pi(\mathbf{x}')/\pi(\mathbf{x}))$ .

But what if we cannot evaluate $\pi(\mathbf{x})$ exactly? Such a situation might arise if we are given a joint density function $\pi(\mathbf{x}, \mathbf{z})$, with $\mathbf{z} \in \mathcal Z$, and we must marginalize out $\mathbf{z}$ in order to compute $\pi(\mathbf{x})$. In this situation, we may only be able to approximate

$$\pi(\mathbf{x}) = \int \pi(\mathbf{x},\mathbf{z}) \, \mathrm d\mathbf{z} ,$$

for instance with importance sampling. If we draw i.i.d. variables $\{\mathbf{z}_1, \ldots, \mathbf{z}_N\}$ from the distribution the density function $q(\mathbf{z})$, then our importance sampling estimate will be

$$\tilde{\pi}(\mathbf{x}) = \frac{1}{N} \sum_{n=1}^{N} \frac{\pi(\mathbf{x}, \mathbf{z}_n)}{q(\mathbf{z}_n)} .$$

What happens if we go ahead and run Metropolis-Hastings on the estimated density function $\tilde{\pi}(\mathbf{x})$? If we generate new variables $\{\mathbf{z}_1, \ldots, \mathbf{z}_N\}$ once for each proposal, then we can view this as performing a random walk through $\mathcal X \times \mathcal Z^N$. To be clear, if we are at the current state $(\mathbf{x},\mathbf{z}) = (\mathbf{x}, \mathbf{z}_1, \ldots, \mathbf{z}_N)$, we propose the state $(\mathbf{x}',\mathbf{z}') = ({\bf x'}, \mathbf{z}_1', \ldots, \mathbf{z}_N')$ by drawing ${\bf x'}$ from the original spherical proposal distribution centered on $\mathbf{x}$ and by drawing each $\mathbf{z}_n'$ from $q(\mathbf{z})$. We then accept the proposed state with probability

$\begin{eqnarray*} \frac{\sum_n \frac{\pi(\mathbf{x}', \mathbf{z}_n')}{q(\mathbf{z}_n')}}{\sum_n \frac{\pi(\mathbf{x}, \mathbf{z}_n)}{q(\mathbf{z}_n)}} & = & \frac{\left(\sum_n \frac{\pi(\mathbf{x}', \mathbf{z}_n') }{q(\mathbf{z}_n')}\right)\prod_n q(\mathbf{z}_n')}{\left(\sum_n \frac{\pi(\mathbf{x}, \mathbf{z}_n)}{q(\mathbf{z}_n)}\right)\prod_n q(\mathbf{z}_n)} \cdot \frac{\prod_n q(\mathbf{z}_n)}{\prod_n q(\mathbf{z}_n')} \end{eqnarray*}$

Since the ratio of transition probabilities is given by

$$\frac{P[(\mathbf{x}',\mathbf{z}') \to (\mathbf{x},\mathbf{z})]}{P[(\mathbf{x},\mathbf{z}) \to (\mathbf{x}',\mathbf{z}')]} = \frac{\prod_n q(\mathbf{z}_n)}{\prod_n q(\mathbf{z}_n')} ,$$

we can view this approximate Metropolis-Hastings algorithm on $\mathcal X$ as an exact Metropolis-Hastings algorithm on $\mathcal X \times \mathcal Z^N$ with stationary distribution given by

$$p(\mathbf{x}, \mathbf{z}) \propto \left( \sum_n \frac{\pi(\mathbf{x}, \mathbf{z}_n) }{q(\mathbf{z}_n)}\right) \prod_n q(\mathbf{z}_n) .$$

For each $x \in \mathcal X$, define the function $w_x \colon \mathcal Z^N \to \mathbb R$ by

$$w_{\mathbf{x}}(\mathbf{z}_1, \ldots, \mathbf{z}_N) = \frac{\frac{1}{N}\sum_n \frac{\pi(\mathbf{x}, \mathbf{z}_n)}{q(\mathbf{z}_n)}}{\pi(\mathbf{x})} .$$

This is essentially the random variable $\tilde{\pi}(\mathbf{x})/\pi(\mathbf{x})$. It follows that

$$p(\mathbf{x}, \mathbf{z}) \propto \pi(\mathbf{x}) \, w_{\mathbf{x}}(\mathbf{z}_1, \ldots, \mathbf{z}_N) \, \prod_n q(\mathbf{z}_n) .$$

But

$$$\int p(\mathbf{x}, \mathbf{z}) \, \mathrm d\mathbf{z} \propto \pi(\mathbf{x}) \, \mathbb E[w_{\mathbf{x}}] = \pi(\mathbf{x}) .$$$

Indeed,

$\begin{eqnarray*} \mathbb E[w_{\mathbf{x}}] & = & \frac{1}{N} \sum_{n = 1}^N \int \frac{\pi(\mathbf{x}, \mathbf{z}_n)}{\pi(\mathbf{x}) \, q(\mathbf{z}_n)} \prod_{m=1}^N q(\mathbf{z}_m) \, \mathrm d\mathbf{z} \\ & = & \frac{1}{N} \sum_{n = 1}^N \int \pi(\mathbf{z}_n \,|\, \mathbf{x}) \, \mathrm d\mathbf{z}_n \\ & = & 1. \end{eqnarray*}$

The important part is that $\mathbb E[w_{\mathbf{x}}]$ is independent of $\mathbf{x}$. Therefore, equation (1) tells us that the stationary distribution of our Markov chain has the desired marginal distribution. In other words, we can run Metropolis-Hastings on the approximation $\tilde{\pi}(\mathbf{x})$ of the true density function and still get the correct outcome.