Galvanized by the accelerated pace and ease of data collection, researchers in more and more disciplines are turning to large, heterogeneous datasets to answer scientific questions. Divining insight from massive and complex data, however, requires flexible models and efficient inference of meaningful factors of variation. This thesis develops new statistical models and methods to help practitioners answer quantitative questions and more efficiently explore their data. The first part of this thesis presents three applied probabilistic modeling case studies in a diverse set of domains: astronomy, healthcare, and sports analytics. For each application we develop a probabilistic model for high-dimensional observations to address a particular goal—to make robust, portable predictions, find latent structure, or to organize and visualize interpretable factors of variation in the data. Guided by these examples, we discuss the common challenges of specifying interpretable-yet-flexible probabilistic models in applied settings. Motivated by the challenges of applying probabilistic models to large datasets, the second part of this thesis develops new algorithms for approximate Bayesian inference. We focus on improving variational inference, a widely used class of approximation algorithms. We develop two new techniques to improve the accuracy and computational efficiency of variational inference methods. We further generalize one technique into a class of computationally efficient Monte Carlo estimators.
@phdthesis{miller2018thesis, year = {2018}, author = {Miller, Andrew}, title = {Advances in Monte Carlo Variational Inference and Applied Probabilistic Modeling}, month = may, school = {Harvard University}, address = {Cambridge, MA} }