The proof and intuition presented here come from this excellent writeup by Yuval Filmus, which in turn draws upon ideas in this book by Fumio Hiai and Denes Petz. Suppose that we have a sequence of real-valued random variables $X_1, X_2, \ldots$. Define the random variable

\[\begin{equation} A_N = \frac{X_1 + \cdots + X_N}{\sqrt{N}} \end{equation}\]to be a scaled sum of the first $N$ variables in the sequence. Now, we would like to make interesting statements about the sequence

\[\begin{equation} A_1, A_2, \ldots . \end{equation}\]The central limit theorem is quite general. To simplify this exposition, I will make a number of assumptions. First, I will assume that the $X_i$ are independent and identically distributed. Second, I will assume that each $X_i$ has mean $\mathbb E[X_n] = 0$ and variance $\textnormal{var}(X_n) = \sigma^2$. Finally, I will assume that the moment generating function (to be defined below) converges (this condition requires all moments of the distribution to exist).

Under these conditions, the central limit theorem tells us that

\[\begin{equation} A_1, A_2, \ldots \to \mathcal N(0,\sigma^2) , \end{equation}\]where $N(\mu, \sigma^2)$ is the normal distribution with density function

\[\begin{equation} \mathcal N(x ; \mu, \sigma^2) = \frac{1}{ \sigma \sqrt{2 \pi}} \exp\left(-\frac{ (x-\mu)^2}{2\sigma^2}\right) , \end{equation}\]and where $P_1 \to P_2$ means that $P_1([a,b]) \to P_2([a,b])$ for all intervals $[a,b] \subset \mathbb R$. It is not immediately obvious that the sequence $\{A_n\}$ should converge, but if it does converge, the normal distribution is the natural candidate. Suppose it converges to some distribution $D$. Now, consider the random variable

\[\begin{equation} B_N = \frac{1}{\sqrt{2}} \left(\frac{X_1 + \cdots + X_N}{\sqrt{N}} + \frac{X_{N+1} + \cdots X_{2N}}{\sqrt{N}} \right) . \end{equation}\]Of course, $B_N$ is just $A_{2N}$, so $B_N \to D$. But the first term on the RHS is $A_N$, and the second term on the RHS has the same distribution as $A_N$, so we have

\[\begin{equation} B_N \to \frac{D + D}{\sqrt{2}} , \end{equation}\]where the two $D$'s are independent random variables with the same distribution. So $D$ and $(D + D)/\sqrt{2}$ must have the same distribution. By grouping terms in different proportions, we can derive similar properties of $D$. Since the normal distribution satisfies

\[\begin{eqnarray} \mathcal N(\mu_1, \sigma_1^2) + \mathcal N(\mu_2, \sigma_2^2) & = & \mathcal N(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2) \\ c \mathcal N(\mu, \sigma^2) & = & \mathcal N(c\mu, c^2\sigma^2) , \end{eqnarray}\]it is a natural candidate for the distribution $D$.

To prove the central limit theorem, we will make use of the moment generating function

\[\begin{equation} M_{X}(t) = \mathbb E [\exp (tX) ] = 1 + \mathbb E[X] t + \frac{1}{2} \mathbb E[X^2] t^2 + \cdots \end{equation}\]and the cumulant generating function

\[\begin{equation} K_{X}(t) = \log \mathbb E[\exp (tX) ] = \mathbb E[X] t + \frac{\mathbb E[X^2] - \mathbb E[X]^2}{2} t^2 + \cdots . \end{equation}\]The coefficients of the moment generating function and the cumulant generating function (divided by $n!$) are referred to as "moments" and "cumulants" respectively. The cumulants and the moments are closely related, and the values of one determine the values of the other. Incidentally, the moment generating function and the cumulant generating function of the normal distribution $\mathcal N(\mu,\sigma^2)$ are given by

\[\begin{eqnarray} M(t) & = & \exp\left(\mu t + \frac{\sigma^2}{2} t^2 \right) \\ K(t) & = & \mu t + \frac{\sigma^2}{2} t^2 . \end{eqnarray}\]Note that the moment generating function satisfies

\[\begin{eqnarray} M_{X+Y}(t) & = & M_{X}(t) M_{Y}(t) \\ M_{cX}(t) & = & M_{X}(ct) . \end{eqnarray}\]It follows that the cumulant generating function satisfies

\[\begin{eqnarray} K_{X+Y}(t) & = & K_{X}(t) + K_{Y}(t) \\ K_{cX}(t) & = & K_{X}(ct) . \end{eqnarray}\]Now, we are going to use all of these tools. Let's inspect the cumulant generating function of $A_N$. We have

\[\begin{equation} K_{A_N}(t) = K_{X_1}\left(\frac{t}{\sqrt{N}}\right) + \cdots + K_{X_N}\left(\frac{t}{\sqrt{N}}\right) . \end{equation}\]Let $k^m_{X}$ be the $m$th coefficient of $K_{X}(t)$. Equating powers of $t$ above, we get

\[\begin{equation} k^m_{A_N} = \left(k^m_{X_1} + \cdots + k^m_{X_N} \right) N^{-\frac{m}{2}} . \end{equation}\]From the case $m=1$, we see that

\[\begin{equation} \mathbb E[A_N] = \frac{\mathbb E[X_1] + \cdots \mathbb E[X_N]}{\sqrt{N}} = 0, \end{equation}\]as expected. From the case $m=2$, we see that

\[\begin{equation} \textnormal{var}(A_N) = \frac{\textnormal{var}(X_1) + \cdots \textnormal{var}(X_N)}{N} = \sigma^2, \end{equation}\]also as expected. But what about higher cumulants? For $k > 2$, as $N \to \infty$, we have

\[\begin{equation} k^m_{A_N} = k^m_{X_1} N^{1-\frac{N}{2}} \to 0 . \end{equation}\]Therefore, the higher cumulants all vanish. It follows that the cumulants of the sequence $\{A_N\}$ converge to the cumulants of $\mathcal N(0,\sigma^2)$. Therefore, the moments of the sequence $\{A_N\}$ converge to the moments of $\mathcal N(0,\sigma^2)$. It follows from Levy's continuity theorem, that $A_N \to \mathcal N(0,\sigma^2)$, as desired.