Mutual information is a quantification of the dependency between random variables. It is sometimes contrasted with linear correlation since mutual information captures nonlinear dependence. In this short note I will discuss the relationship between these quantities in the case of a bivariate Gaussian distribution, and I will explore two implications of that relationship.

As shown below, in the case of $X$ and $Y$ having a bivariate Normal distribution, mutual information is a monotonic transformation of correlation,

$$

I(X,Y) = -\frac{1}{2} \ln

\left(1 –

Corr(X,Y)^2

\right).

$$

This relationship has a couple implications that I would like to highlight. First, it proves that lack of correlation in the bivariate Normal distribution implies independence, i.e. when the correlation between $X$ and $Y$ is zero, the mutual information will also be zero. More interestingly, if you are willing to assume that the marginal distributions of $X$ and $Y$ are Normal but you are not willing to assume joint normality, this result provides a lower bound on the mutual information between $X$ and $Y$. This lower bound follows from the maximum entropy property of the Normal distribution.

Formally, we have that the entropy of a univariate Gaussian random variable, $X$,

$$H(X) = \frac{1}{2} \ln (2\pi e \sigma_x^2).$$

For a bivariate Gaussian random variable, $(X,Y)$,

$$H(X,Y) = \frac{1}{2} \ln ((2\pi e)^2 (\sigma_x^2 \sigma_y^2 – \sigma_{xy}^2)),$$

where $\sigma_{xy}$ is the covariance of $X$ and $Y$.

Then the mutual information,

\begin{align*}

I(X,Y) &= H(X) + H(Y) – H(X,Y)

\\

&= \frac{1}{2} \ln

\left(

\frac{\sigma_x^2\sigma_y^2}{\sigma_x^2 \sigma_y^2 – \sigma_{xy}^2}

\right)

\\

&= \frac{1}{2} \ln

\left(

\frac{\sigma_y^2}{\sigma_y^2 – \sigma_{xy}^2/\sigma_x^2 }

\right)

\\

&= -\frac{1}{2} \ln

\left(

\frac{\sigma_y^2 – \sigma_{xy}^2/\sigma_x^2 }{\sigma_y^2}

\right)

\\

&= -\frac{1}{2} \ln

\left(1 –

\left(\frac{\sigma_{xy}}{\sigma_x \sigma_y}\right)^2

\right)

\\

&= -\frac{1}{2} \ln

\left(1 –

Corr(X,Y)^2

\right).

\end{align*}

The lower bound follows from the following argument. Consider two other random variables, $X’$ and $Y’$, that have the same covariance as $X$ and $Y$ but are jointly normally distributed. Note that since we have assumed that the marginal distributions of $X$ and $Y$ are Normal, we have $H(X) = H(X’)$ and $H(Y) = H(Y’)$, and by the maximum entropy property of the Normal distribution, we have $H(X,Y) \leq H(X’,Y’)$. The result is then straightforward:

\begin{align*}

I(X,Y) &= H(X) + H(Y) – H(X,Y)

\\

&\geq

H(X’) + H(Y’) – H(X’,Y’)

\\

&= -\frac{1}{2} \ln

\left(1 –

Corr(X’,Y’)^2

\right)

\\

&= -\frac{1}{2} \ln

\left(1 –

Corr(X,Y)^2

\right).

\end{align*}

Thanks are due to Vikash Mansinghka for suggesting the first part of this exercise.