Dealing with Reliability when Crowdsourcing
I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy.
To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are tasks and
workers, and assume that the possible labels are
. Define the
matrix
so that
is the label given to task
by worker
(or
if no label was provided). Let
be the true labels of the tasks. Given
, we wish to come up with an estimator
of
so as to minimize the average error
(1)
The simplest approach is to go with the majority. More precisely, we can choose
(2)
A more sophisticated approach is to try to model the reliability of the workers. We suppose that worker has underlying probability
of accurately labeling a task. So, a worker who is always correct will have
and a worker who guesses randomly will have
. Placing a Beta prior on the reliabilities
, we can come up with a maximum-likelihood estimator
of
given by
(3)
Treating as a hidden variable, we can estimate
using expectation maximization (for instance, see Raykar et al., 2010).
Instead of maximizing over , we might hope to integrate it out. Attempting to do so, we can write
(4)
where we define
(5)
For simplicity of notation, we let denote the total number of tasks labeled by worker
, and we let
denote the total number of tasksĀ correctly labeled by worker
. These values depend on
and
. Note that
only depends on the tasks
that worker
has labeled. This formulation allows us to reinterpret the problem as inference in a factor graph, where the task labels are the variable nodes and the factors are the functions
. Now, we can apply loopy belief propagation to compute the maximum-likelihood configuration
.
We chose to place a Beta prior on the reliabilities , but we didn’t have to do this. Depending on our choice of prior, the factors
may have closed form integrals or they have to be computed numerically. There is a lot of flexibility in this choice. In fact, it turns out that other algorithms, such as majority voting, are equivalent to using belief propagation on the same factor graph with different priors.