Modeling, Inference and Optimization with Composable Differentiable Procedures

Maclaurin, D. (2016). Modeling, Inference and Optimization with Composable Differentiable Procedures [PhD thesis]. Harvard University.
This thesis presents five contributions to machine learning, with themes of differentiability and Bayesian inference. We present Firefly Monte Carlo, an auxiliary variable Markov chain Monte Carlo algorithm that only queries a potentially small subset of data at each iteration yet simulates from the exact posterior distribution. We describe the design and implementation of Autograd, a software package for efficiently computing derivatives of functions written in Python/Numpy using reverse accumulation mode differentiation. Using Autograd, we develop a convolutional neural network that takes arbitrary graphs, such as organic molecules, as input. This generalizes standard molecular feature representations and allows end-to-end adaptation of the feature extraction pipeline to particular tasks. We show how to compute gradients of cross-validation loss with respect to hyperparameters of learning algorithms, with both time and memory efficiency, by chaining gradients backwards through an exactly reversed optimization procedure. Finally, by accounting for the entropy destroyed by optimization, we show that early stopping and ensembling, popular tricks for avoiding overfitting, can be interpreted as variational Bayesian inference.
  @phdthesis{maclaurin2016thesis,
  year = {2016},
  author = {Maclaurin, Dougal},
  title = {Modeling, Inference and Optimization with Composable Differentiable Procedures},
  month = apr,
  school = {Harvard University},
  address = {Cambridge, MA}
}