Dealing with Reliability when Crowdsourcing

[latexpage]I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy.

To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are $N$ tasks and $M$ workers, and assume that the possible labels are $\{\pm 1\}$. Define the $N \times M$ matrix $L$ so that $L_{ij}$ is the label given to task $i$ by worker $j$ (or $0$ if no label was provided). Let $z = (z_1, \ldots, z_N)$ be the true labels of the tasks. Given $L$, we wish to come up with an estimator $\hat{z}$ of $z$ so as to minimize the average error


\frac{1}{N} \sum_{i=1}^N \text{prob}[\hat{z}_i \ne z_i] .

\end{align} Continue reading “Dealing with Reliability when Crowdsourcing”

Complexity of Inference in Bayesian Networks


Developing efficient (i.e. polynomial time) algorithms with guaranteed performance is a central goal in computer science (perhaps the central goal). In machine learning, inference algorithms meeting these requirements are much rarer than we would like: often, an algorithm is either efficient but doesn’t perform optimally or vice versa. A number of results from the 1990’s demonstrate the challenges of, but also the potential for, efficient Bayesian inference. These results were carried out in the context of Bayesian networks.

Continue reading “Complexity of Inference in Bayesian Networks”

It Depends on the Model


In my last blog post I wrote about the asymptotic equipartition principle. This week I will write about something completely unrelated.

This blog post evolved from a discussion with Brendan O’Connor about science and evidence. The back story is as follows. Continue reading “It Depends on the Model”