But it actually never occurred to me that I could be using this same trick even for numbers that I know exactly, by just adding back in whatever factor I removed. (by “factor” I mean here each term is divided by exp(a) and in the end the sum is effectively multiplied by exp(a), so the “factor” I’m talking about is exp(a))

]]>-Matt

]]>http://machineintelligence.tumblr.com/post/4998477107/the-log-sum-exp-trick ]]>

Here’s another posterior inference testing approach, that I’ve used before, though only for small models:

http://www.stat.cmu.edu/~acthomas/724/Cook.pdf

(Cook, Gelman, Rubin 2006, Validation of Software for Bayesian Models Using Posterior Quantiles)

It doesn’t have the magnification effect, but it applies more generally than MCMC (though that paper casts it overly narrowly, only talking about for sampling methods). You have to do many simulations: Simulate theta’ ~ fixedprior, then x’ ~ theta’, then run your inference algorithm to compute a posterior CDF of theta’ on the p(theta|x,fixedprior) distribution. Over many data simulations, this should be uniform. Or put another way, frequentist coverage is correct: e.g. 50% intervals contain the true theta’ exactly 50% of the time.

Like Geweke, the Cook et al. paper presents frequentist tests for this, but I think more useful is that they plot the quantile values (or rather, z-score transformations of them, which make it easy to see extreme cases). This probably can’t get subtle differences like your 1% overestimate example.

However, I’ve found one nice thing: you can make P-P plots (um, or “QQ plots”?) to get some idea of whether the posteriors tend to be too wide or too narrow, which correspond to S-shapes on a P-P plot. Or it’s easier to check just with CI coverage (e.g. if your 50% intervals trap theta’ 70% of the time, your posterior variance is too wide).

I think biased estimates correspond to humps above or below on that P-P plot, though I’d have to think through it more. There also might be certain types of checks you can do with point estimates too — your posterior means should be unbiased, maybe, so you can check whether you tend to be too high or too low? I’m less sure about this.

]]>Just one observation about the python implementation above: when updating the probability mass (line 29), wouldn’t it be better to rearrange it to:

q[large] = (q[large] + q[small]) – 1.0 or q[large] = (q[large] – 1.0) + q[small]

To minimise the rounding error?

(see http://www.keithschwarz.com/darts-dice-coins/ section: “A Practical Version of Vose’s Algorithm”)