## Dealing with Reliability when Crowdsourcing

[latexpage]I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy.

To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are $N$ tasks and $M$ workers, and assume that the possible labels are $\{\pm 1\}$. Define the $N \times M$ matrix $L$ so that $L_{ij}$ is the label given to task $i$ by worker $j$ (or $0$ if no label was provided). Let $z = (z_1, \ldots, z_N)$ be the true labels of the tasks. Given $L$, we wish to come up with an estimator $\hat{z}$ of $z$ so as to minimize the average error

\begin{align}

\frac{1}{N} \sum_{i=1}^N \text{prob}[\hat{z}_i \ne z_i] .

## Complexity of Inference in Bayesian Networks

[latexpage]

Developing efficient (i.e. polynomial time) algorithms with guaranteed performance is a central goal in computer science (perhaps the central goal). In machine learning, inference algorithms meeting these requirements are much rarer than we would like: often, an algorithm is either efficient but doesn’t perform optimally or vice versa. A number of results from the 1990’s demonstrate the challenges of, but also the potential for, efficient Bayesian inference. These results were carried out in the context of Bayesian networks.