# Correlation and Mutual Information

[latexpage]

Mutual information is a quantification of the dependency between random variables. It is sometimes contrasted with linear correlation since mutual information captures nonlinear dependence. In this short note I will discuss the relationship between these quantities in the case of a bivariate Gaussian distribution, and I will explore two implications of that relationship.

As shown below, in the case of $X$ and $Y$ having a bivariate Normal distribution, mutual information is a monotonic transformation of correlation,
$$I(X,Y) = -\frac{1}{2} \ln \left(1 – Corr(X,Y)^2 \right).$$
This relationship has a couple implications that I would like to highlight. First, it proves that lack of correlation in the bivariate Normal distribution implies independence, i.e. when the correlation between $X$ and $Y$ is zero, the mutual information will also be zero. More interestingly, if you are willing to assume that the marginal distributions of $X$ and $Y$ are Normal but you are not willing to assume joint normality, this result provides a lower bound on the mutual information between $X$ and $Y$. This lower bound follows from the maximum entropy property of the Normal distribution.

Formally, we have that the entropy of a univariate Gaussian random variable, $X$,
$$H(X) = \frac{1}{2} \ln (2\pi e \sigma_x^2).$$

For a bivariate Gaussian random variable, $(X,Y)$,
$$H(X,Y) = \frac{1}{2} \ln ((2\pi e)^2 (\sigma_x^2 \sigma_y^2 – \sigma_{xy}^2)),$$
where $\sigma_{xy}$ is the covariance of $X$ and $Y$.

Then the mutual information,
\begin{align*}
I(X,Y) &= H(X) + H(Y) – H(X,Y)
\\
&= \frac{1}{2} \ln
\left(
\frac{\sigma_x^2\sigma_y^2}{\sigma_x^2 \sigma_y^2 – \sigma_{xy}^2}
\right)
\\
&= \frac{1}{2} \ln
\left(
\frac{\sigma_y^2}{\sigma_y^2 – \sigma_{xy}^2/\sigma_x^2 }
\right)
\\
&= -\frac{1}{2} \ln
\left(
\frac{\sigma_y^2 – \sigma_{xy}^2/\sigma_x^2 }{\sigma_y^2}
\right)
\\
&= -\frac{1}{2} \ln
\left(1 –
\left(\frac{\sigma_{xy}}{\sigma_x \sigma_y}\right)^2
\right)
\\
&= -\frac{1}{2} \ln
\left(1 –
Corr(X,Y)^2
\right).
\end{align*}

The lower bound follows from the following argument. Consider two other random variables, $X’$ and $Y’$, that have the same covariance as $X$ and $Y$ but are jointly normally distributed. Note that since we have assumed that the marginal distributions of $X$ and $Y$ are Normal, we have $H(X) = H(X’)$ and $H(Y) = H(Y’)$, and by the maximum entropy property of the Normal distribution, we have $H(X,Y) \leq H(X’,Y’)$. The result is then straightforward:
\begin{align*}
I(X,Y) &= H(X) + H(Y) – H(X,Y)
\\
&\geq
H(X’) + H(Y’) – H(X’,Y’)
\\
&= -\frac{1}{2} \ln
\left(1 –
Corr(X’,Y’)^2
\right)
\\
&= -\frac{1}{2} \ln
\left(1 –
Corr(X,Y)^2
\right).
\end{align*}

Thanks are due to Vikash Mansinghka for suggesting the first part of this exercise.