[latexpage]It often comes up in neural networks, generalized linear models, topic models and many other probabilistic models that one wishes to parameterize a discrete distribution in terms of an unconstrained vector of numbers, i.e., a vector that is not confined to the simplex, might be negative, etc. A very common way to address this is to use the “softmax” transformation: \begin{align*} \pi_k &= \frac{\exp\{x_k\}}{\sum_{k’=1}^K\exp\{x_{k’}\}} \end{align*} where the $x_k$ are unconstrained in $\mathbb{R}$, but the $\pi_k$ live on the simplex, i.e., $\pi_k \geq 0$ and $\sum_{k}\pi_k=1$. The $x_k$ parameterize a discrete distribution (not uniquely) and we can generate data by performing the softmax transformation and then doing the usual thing to draw from a discrete distribution. Interestingly, it turns out that …

## Upcoming Conferences

We’re just about to hit conference season, so I thought I would post a public service announcement identifying various upcoming events for folks into machine learning and Bayesian modeling. International Conference on Artificial Intelligence and Statistics (AISTATS) in Scottsdale, AZ: April 29 – May 1, 2013 New England Machine Learning Day at Microsoft Research New England: May 1, 2013 First International Conference on Learning Representations (ICLR) in Scottsdale, AZ: May 2-4, 2013 Conference on Bayesian Nonparametrics, Amsterdam: June 10-14, 2013 Conference on Learning Theory (COLT), Princeton, NJ: June 12-14, 2013 International Conference on Machine Learning (ICML), Atlanta, GA: June 16-21, 2013 Conference on Uncertainty in Artificial Intelligence (UAI), Bellvue, WA: July 11-15, 2013 Joint Statistical Meetings (JSM), Montreal: August 3-8, …

## The Alias Method: Efficient Sampling with Many Discrete Outcomes

[latexpage] When implementing algorithms for inference and learning with probabilistic models, it commonly comes up that one needs to sample from a discrete distribution. That is, from a multinomial distribution with parameter $\boldsymbol{\pi}\in\mathbb{R}^K$, such that $\pi_k\geq 0$ and $\sum_{k}\pi_k=1$. A somewhat more common occurrence is that we have a $\boldsymbol{\phi}\in\mathbb{R}^K$ where $\phi_k\geq 0$, but we don’t know the normalization constant. That is, our $\boldsymbol{\phi}$ is only proportional to the multinomial parameter $\boldsymbol{\pi}$. We want to rapidly generate a variate according to $\boldsymbol{\pi}$, given $\boldsymbol{\pi}$, something easily done with (Matlab) code such as this (paraphrased from Tom Minka‘s Lightspeed Toolbox): cdf = cumsum(phi); samp_k = sum(cdf < rand()*cdf(end)) + 1; This is nice and simple, but you'll notice that it has ...

## What is the Computational Capacity of the Brain?

One big recent piece of news from across the Atlantic is that the European Commission is funding a brain simulation project to the tune of a billion Euros. Roughly, the objective is to simulate the entire human brain using a supercomputer. Needless to say, many people are skeptical and there are lots of reasons that one might think this project is unlikely to yield useful results. One criticism centers on whether even a supercomputer can simulate the complexity of the brain. A first step towards simulating a brain is thinking about how many FLOP/s (floating point operations per second) would be necessary to implement similar functionality in conventional computer hardware. Here I will discuss two back-of-the-envelope estimates of this computational …

## Computing Log-Sum-Exp

[latexpage]This post is about a computational trick that everyone should know, but that doesn’t tend to be explicitly taught in machine learning courses. Imagine that we have a set of $N$ values, $\{x_n\}^N_{n=1}$ and we want to compute the quantity \begin{align} z = \log \sum_{n=1}^N \exp\{x_n\}. \end{align} This comes up all the time when you want to parameterize a multinomial distribution using a softmax, e.g., when doing logistic regression and you have more than two unordered categories. If you want to compute the log likelihood, you’ll find such an expression due to the normalization constant. Computing this naively can be a recipe for disaster, due to underflow or overflow, depending on the scale of the $x_n$. Consider a simple example, …

## The Poisson Estimator

[latexpage]Much of what we do when we analyze data and invent algorithms is think about estimators for unknown quantities, even when we don’t directly phrase things this way. One type of estimator that we commonly encounter is the Monte Carlo estimator, which approximates expectations via the sample mean. That is, many problems in which we are interested involve a distribution $\pi$ on a space $\mathcal{X}$, where we wish to calculate the expectation of a function $f(x)$: \begin{align*} \hat{f}_{\pi} &= \int_{\mathcal{X}} \pi(x)\,f(x)\,\mathrm{d}x\\ &\approx \frac{1}{N}\sum_{n=1}^N f(x_n) \text{\qquad where } x_n \sim \pi. \end{align*} This is very nice because it gives you an unbiased estimator of $\hat{f}_\pi$. That is, the expectation of this estimator is the desired quantity. However, one issue that comes …

## New Blog

I’m excited to announce a new collaborative blog, written by members of the Harvard Intelligent Probabilistic Systems group. Broadly, our group studies machine learning, statistics, and computational neuroscience, but we’re interested in lots of things outside these areas as well. The idea is to use this as a venue to discuss interesting ideas and results — new and old — about probabilistic modeling, inference, artificial intelligence, theoretical neuroscience, or anything else research-related that strikes our fancy. There will be posts from folks at both Harvard and MIT, in computer science, mathematics, biophysics, and BCS departments, so expect a wide variety of interests. — Ryan Adams