Exponential Families and Maximum Entropy
An exponential family parametrized by is the set of probability distributions that can be expressed as
for given functions (the partition function),
, and
(the vector of sufficient statistics). Exponential families can be discrete or continuous, and examples include Gaussian distributions, Poisson distributions, and gamma distributions. Exponential families have a number of desirable properties. For instance, they have conjugate priors and they can summarize arbitrary amounts of data using a fixed-size vector of sufficient statistics. But in addition to their convenience, their use is theoretically justified.
Suppose we would like to find a particular probability distribution . All we know about
is that
for a specific collection of functions
. Which distribution should we use? The principle of maximum entropy asserts that when choosing among all valid distributions, we should choose the one with maximum entropy. In other words, we should choose the distribution which has the greatest uncertainty consistent with the constraints.
For simplicity, consider the case in which is a distribution over a finite set
. Then
can be written as a vector
. We would like to maximize
subject to the constraints and
. Defining the quantity
and using Lagrange multipliers in the usual way, we end up with the constraints
Using only the first equality, we see that
where and
. This derivation shows that
belongs to an exponential family.