Correlation and Mutual Information

Peter KrafftStatisticsLeave a Comment

Mutual information is a quantification of the dependency between random variables. It is sometimes contrasted with linear correlation since mutual information captures nonlinear dependence. In this short note I will discuss the relationship between these quantities in the case of a bivariate Gaussian distribution, and I will explore two implications of that relationship.

As shown below, in the case of X and Y having a bivariate Normal distribution, mutual information is a monotonic transformation of correlation,

    \[I(X,Y) = -\frac{1}{2} \ln  \left(1 -  Corr(X,Y)^2 \right).\]

This relationship has a couple implications that I would like to highlight. First, it proves that lack of correlation in the bivariate Normal distribution implies independence, i.e. when the correlation between X and Y is zero, the mutual information will also be zero. More interestingly, if you are willing to assume that the marginal distributions of X and Y are Normal but you are not willing to assume joint normality, this result provides a lower bound on the mutual information between X and Y. This lower bound follows from the maximum entropy property of the Normal distribution.

Formally, we have that the entropy of a univariate Gaussian random variable, X,

    \[H(X) = \frac{1}{2} \ln (2\pi e \sigma_x^2).\]

For a bivariate Gaussian random variable, (X,Y),

    \[H(X,Y) = \frac{1}{2} \ln ((2\pi e)^2 (\sigma_x^2 \sigma_y^2 - \sigma_{xy}^2)),\]

where \sigma_{xy} is the covariance of X and Y.

Then the mutual information,

    \begin{align*} I(X,Y) &= H(X) + H(Y) - H(X,Y) \\ &= \frac{1}{2} \ln  \left( \frac{\sigma_x^2\sigma_y^2}{\sigma_x^2 \sigma_y^2 - \sigma_{xy}^2} \right) \\ &= \frac{1}{2} \ln  \left( \frac{\sigma_y^2}{\sigma_y^2 - \sigma_{xy}^2/\sigma_x^2 } \right) \\ &= -\frac{1}{2} \ln  \left( \frac{\sigma_y^2 - \sigma_{xy}^2/\sigma_x^2 }{\sigma_y^2} \right) \\ &= -\frac{1}{2} \ln  \left(1 -  \left(\frac{\sigma_{xy}}{\sigma_x \sigma_y}\right)^2 \right) \\ &= -\frac{1}{2} \ln  \left(1 -  Corr(X,Y)^2 \right). \end{align*}

The lower bound follows from the following argument. Consider two other random variables, X' and Y', that have the same covariance as X and Y but are jointly normally distributed. Note that since we have assumed that the marginal distributions of X and Y are Normal, we have H(X) = H(X') and H(Y) = H(Y'), and by the maximum entropy property of the Normal distribution, we have H(X,Y) \leq H(X',Y'). The result is then straightforward:

    \begin{align*} I(X,Y) &= H(X) + H(Y) - H(X,Y) \\ &\geq H(X') + H(Y') - H(X',Y') \\ &= -\frac{1}{2} \ln  \left(1 -  Corr(X',Y')^2 \right) \\ &= -\frac{1}{2} \ln  \left(1 -  Corr(X,Y)^2 \right). \end{align*}

Thanks are due to Vikash Mansinghka for suggesting the first part of this exercise.

Leave a Reply