Dealing with Reliability when Crowdsourcing
I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy.
To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are tasks and workers, and assume that the possible labels are . Define the matrix so that is the label given to task by worker (or if no label was provided). Let be the true labels of the tasks. Given , we wish to come up with an estimator of so as to minimize the average error
The simplest approach is to go with the majority. More precisely, we can choose
A more sophisticated approach is to try to model the reliability of the workers. We suppose that worker has underlying probability of accurately labeling a task. So, a worker who is always correct will have and a worker who guesses randomly will have . Placing a Beta prior on the reliabilities , we can come up with a maximum-likelihood estimator of given by
Treating as a hidden variable, we can estimate using expectation maximization (for instance, see Raykar et al., 2010).
Instead of maximizing over , we might hope to integrate it out. Attempting to do so, we can write
where we define
For simplicity of notation, we let denote the total number of tasks labeled by worker , and we let denote the total number of tasks correctly labeled by worker . These values depend on and . Note that only depends on the tasks that worker has labeled. This formulation allows us to reinterpret the problem as inference in a factor graph, where the task labels are the variable nodes and the factors are the functions . Now, we can apply loopy belief propagation to compute the maximum-likelihood configuration .
We chose to place a Beta prior on the reliabilities , but we didn’t have to do this. Depending on our choice of prior, the factors may have closed form integrals or they have to be computed numerically. There is a lot of flexibility in this choice. In fact, it turns out that other algorithms, such as majority voting, are equivalent to using belief propagation on the same factor graph with different priors.