The ELBO without Jensen, Kullback, or Leibler
The log marginal likelihood is a central object for Bayesian inference with latent variable models:
where





The hope is that this will be a tight enough bound that we can use this as a proxy for the marginal likelihood when reasoning about

ELBO via Jensen’s Inequality
The Jensen’s approach observes that the expectation of a concave function is always less than or equal to that function evaluated at the expectation of its argument. That is, if is concave, then
There are two steps for the Jensen’s approach to the ELBO. First, multiply and divide inside the integral by


ELBO via Kullback-Leibler Divergence
Alternatively, we could directly write down the KL divergence between and the posterior over latent variables
,
Now let’s both add and subtract the log marginal likelihood

This log marginal likelihood doesn’t actually depend on


Now turn the first two terms into a single expectation under

Rearrange things slightly:
We know that K-L divergences have to be non-negative so that shows us we have a lower bound on the log marginal likelihood.
Alternative Derivation
Start with Bayes’ rule:
and observe that we can rearrange it to give us an expression for the marginal likelihood:
This is true for any choice of

Remember, this is true for all


Now, recall that


Since this is true for all


No reference here to Jensen’s inequality or K-L divergence. One caveat, however, is that the log inequality I used here is one way to prove non-negativity of K-L divergence. You could do this in a different order and it would look like directly taking advantage of the non-negativity of KL in the lower bound.