# The Central Limit Theorem

[latexpage]The proof and intuition presented here come from this excellent writeup by Yuval Filmus, which in turn draws upon ideas in this book by Fumio Hiai and Denes Petz. Suppose that we have a sequence of real-valued random variables

X_1, X_2, \ldots .

Define the random variable

A_N = \frac{X_1 + \cdots + X_N}{\sqrt{N}}

to be a scaled sum of the first $N$ variables in the sequence. Now, we would like to make interesting statements about the sequence

A_1, A_2, \ldots .

The central limit theorem is quite general. To simplify this exposition, I will make a number of assumptions. First, I will assume that the $X_i$ are independent and identically distributed. Second, I will assume that each $X_i$ has mean $\mathbb E[X_n] = 0$ and variance $\textnormal{var}(X_n) = \sigma^2$. Finally, I will assume that the moment generating function (to be defined below) converges (this condition requires all moments of the distribution to exist).

Under these conditions, the central limit theorem tells us that

A_1, A_2, \ldots \to \mathcal N(0,\sigma^2) ,

where $N(\mu, \sigma^2)$ is the normal distribution with density function

\mathcal N(x ; \mu, \sigma^2) = \frac{1}{ \sigma \sqrt{2 \pi}} \exp\left(-\frac{ (x-\mu)^2}{2\sigma^2}\right) ,

and where $P_1 \to P_2$ means that $P_1([a,b]) \to P_2([a,b])$ for all intervals $[a,b] \subset \mathbb R$. It is not immediately obvious that the sequence $\{A_n\}$ should converge, but if it does converge, the normal distribution is the natural candidate. Suppose it converges to some distribution $D$. Now, consider the random variable

B_N = \frac{1}{\sqrt{2}} \left(\frac{X_1 + \cdots + X_N}{\sqrt{N}} + \frac{X_{N+1} + \cdots X_{2N}}{\sqrt{N}} \right) .

Of course, $B_N$ is just $A_{2N}$, so $B_N \to D$. But the first term on the RHS is $A_N$, and the second term on the RHS has the same distribution as $A_N$, so we have

B_N \to \frac{D + D}{\sqrt{2}} ,

where the two $D$’s are independent random variables with the same distribution. So $D$ and $(D + D)/\sqrt{2}$ must have the same distribution. By grouping terms in different proportions, we can derive similar properties of $D$. Since the normal distribution satisfies

\begin{eqnarray}

\mathcal N(\mu_1, \sigma_1^2) + \mathcal N(\mu_2, \sigma_2^2) & = & \mathcal N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2) \\

c \mathcal N(\mu, \sigma^2) & = & \mathcal N(c\mu, c^2\sigma^2) ,

\end{eqnarray}

it is a natural candidate for the distribution $D$.

To prove the central limit theorem, we will make use of the moment generating function

M_{X}(t) = \mathbb E [\exp (tX) ] = 1 + \mathbb E[X] t + \frac{1}{2} \mathbb  E[X^2] t^2 + \cdots

and the cumulant generating function

K_{X}(t) = \log \mathbb E[\exp (tX) ] = \mathbb E[X] t + \frac{\mathbb E[X^2] – \mathbb E[X]^2}{2} t^2 + \cdots .

The coefficients of the moment generating function and the cumulant generating function (divided by $n!$) are referred to as “moments” and “cumulants” respectively. The cumulants and the moments are closely related, and the values of one determine the values of the other. Incidentally, the moment generating function and the cumulant generating function of the normal distribution $\mathcal N(\mu,\sigma^2)$ are given by

\begin{eqnarray}

M(t) & = & \exp\left(\mu t + \frac{\sigma^2}{2} t^2 \right) \\

K(t) & = & \mu t + \frac{\sigma^2}{2} t^2 .

\end{eqnarray}

Note that the moment generating function satisfies

\begin{eqnarray}

M_{X+Y}(t) & = & M_{X}(t) M_{Y}(t) \\

M_{cX}(t) & = & M_{X}(ct) .

\end{eqnarray}

It follows that the cumulant generating function satisfies

\begin{eqnarray}

K_{X+Y}(t) & = & K_{X}(t) + K_{Y}(t) \\

K_{cX}(t) & = & K_{X}(ct) .

\end{eqnarray}

Now, we are going to use all of these tools. Let’s inspect the cumulant generating function of $A_N$. We have

K_{A_N}(t) = K_{X_1}\left(\frac{t}{\sqrt{N}}\right) + \cdots + K_{X_N}\left(\frac{t}{\sqrt{N}}\right) .

Let $k^m_{X}$ be the $m$th coefficient of $K_{X}(t)$. Equating powers of $t$ above, we get

k^m_{A_N} = \left(k^m_{X_1} + \cdots + k^m_{X_N} \right) N^{-\frac{m}{2}} .

From the case $m=1$, we see that

\mathbb E[A_N] = \frac{\mathbb E[X_1] + \cdots \mathbb E[X_N]}{\sqrt{N}} = 0,

as expected. From the case $m=2$, we see that

\textnormal{var}(A_N) = \frac{\textnormal{var}(X_1) + \cdots \textnormal{var}(X_N)}{N} = \sigma^2,

also as expected. But what about higher cumulants? For $k > 2$, as $N \to \infty$, we have

k^m_{A_N} = k^m_{X_1}  N^{1-\frac{N}{2}} \to 0 .

Therefore, the higher cumulants all vanish. It follows that the cumulants of the sequence $\{A_N\}$ converge to the cumulants of $\mathcal N(0,\sigma^2)$. Therefore, the moments of the sequence $\{A_N\}$ converge to the moments of $\mathcal N(0,\sigma^2)$. It follows from Levy’s continuity theorem, that $A_N \to \mathcal N(0,\sigma^2)$, as desired.