Early Stopping is Nonparametric Variational Inference

Duvenaud, D., Maclaurin, D., & Adams, R. P. (2016). Early Stopping is Nonparametric Variational Inference. Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS).

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

  @conference{duvenaud2016early,
  year = {2016},
  author = {Duvenaud, David and Maclaurin, Dougal and Adams, Ryan P.},
  title = {Early Stopping is Nonparametric Variational Inference},
  booktitle = {Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)},
  keywords = {AISTATS, variational inference, deep learning},
  note = {arXiv:1504.01344 [stat.ML]}
}