When we talk about priors and regularization, we often motivate them in terms of “incorporating knowledge” or “preventing overfitting.” In a sense, the two are equivalent: any prior or regularizer must favor certain explanations relative to others, so favoring one explanation is equivalent to punishing others. But I’ll argue that these are two very different phenomena, and it’s useful to know which one is going on. Continue reading “Prior knowledge and overfitting”
This is the second of two posts based on a testing tutorial I’m writing with David Duvenaud.
In my last post, I talked about checking the MCMC updates using unit tests. Most of the mistakes I’ve caught in my own code were ones I caught with unit tests. (Naturally, I have no idea about the ones I haven’t caught.) But no matter how thoroughly we unit test, there are still subtle bugs that slip through the cracks. Integration testing is a more global approach, and tests the overall behavior of the software, which depends on the interaction of multiple components. Continue reading “Testing MCMC code, part 2: integration tests”
I will dedicate the next few posts to variational inference methods as a way to organize my own understanding – this first one will be pretty basic.
The goal of variational inference is to approximate an intractable probability distribution, $p$, with a tractable one, $q$, in a way that makes them as ‘close’ as possible. Let’s unpack that statement a bit.
I thought it would be fun to quickly demonstrate how to write a stateless stochastic memoizer in Haskell. The idea behind stochastic memoization is to take some base generative process and transform it into another process which, on each sample, either draws from the original process or draws from the distribution of previously drawn samples. This allows you, for example, to introduce memory or smoothness into your sampling distribution [1, 2]. Continue reading “Stochastic memoization in Haskell”
Reversible jump Markov Chain Monte Carlo (RJMCMC)  is an extension of the Metropolis-Hastings algorithm that allows sampling from a distribution over models with potentially different numbers of parameters. In this post we are interested in determining the number of components to use when modeling data with a mixture model. The number of components corresponds to the dimension of the space we are walking through. The point of this post is to clear up a common error seen in the literature involving computing a Jacobian that arises in the algorithm.
Continue reading “The Correct Birth/Death Jacobian for Mixture Models”