[latexpage]

Bayesian nonparametrics allow the contruction of statistical models whose complexity is determined by the observed data. This is accomplished by specifying priors over infinite dimensional distributions. The most widely used Bayesian nonparametric priors in machine learning are the Dirichlet process, the beta process and their corresponding marginal processes the Chinese restaurant process and the Indian buffet process respectively. The Dirichlet process provides a prior for the mixing measure of an infinite mixture model and the beta process can be used as a prior for feature popularity in a latent feature model. The hierarchical Dirichlet process (HDP) also appears frequently in machine learning underlying topic models with an infinite number of topics. A main selling point of Bayesian nonparametrics has been that the complexity of the model can grow as more data is observed. While true in theory, current inference algorithms for Bayesian nonparametric models fail to scale to the massive data sets available making the model impractical. In this post I review traditional inference algorithms for Bayesian nonparametric models and discuss a new approach for handling massive data sets using stochastic variational inference. Continue reading “Bayesian nonparametrics in the real world”