Prior knowledge and overfitting

When we talk about priors and regularization, we often motivate them in terms of “incorporating knowledge” or “preventing overfitting.” In a sense, the two are equivalent: any prior or regularizer must favor certain explanations relative to others, so favoring one explanation is equivalent to punishing others. But I’ll argue that these are two very different phenomena, and it’s useful to know which one is going on. Continue reading “Prior knowledge and overfitting”

ICML Highlight: Fast Dropout Training

In this post, I’ll summarize one of my favorite papers from ICML 2013: Fast Dropout Training, by Sida Wang and Christopher Manning. This paper derives an analytic approximation to dropout, a randomized regularization method recently proposed for training deep nets that has allowed big improvements in predictive accuracy.   Their approximation gives a roughly 10-times speedup under certain conditions.  Much more interestingly, the authors also show strong connections to existing regularization methods, shedding light on why dropout works so well. Continue reading “ICML Highlight: Fast Dropout Training”