Dealing with Reliability when Crowdsourcing

Robert Nishihara Machine Learning, Statistics Leave a Comment

[latexpage]I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy. To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are $N$ tasks and $M$ workers, and assume that the possible labels are $\{\pm 1\}$. Define the $N \times M$ matrix $L$ so that $L_{ij}$ is …