Dealing with Reliability when Crowdsourcing

Robert Nishihara · January 29, 2013

I recently read the paper "Variational Inference for Crowdsourcing," by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy.

To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are $N$ tasks and $M$ workers, and assume that the possible labels are $\{\pm 1\}$. Define the $N \times M$ matrix $L$ so that $L_{ij}$ is the label given to task $i$ by worker $j$ (or $0$ if no label was provided). Let $z = (z_1, \ldots, z_N)$ be the true labels of the tasks. Given $L$, we wish to come up with an estimator $\hat{z}$ of $z$ so as to minimize the average error

\[\begin{align} \frac{1}{N} \sum_{i=1}^N \text{prob}[\hat{z}_i \ne z_i]. \end{align}\]

The simplest approach is to go with the majority. More precisely, we can choose

\[\begin{align} \hat{z}_i = \text{sign}\left(\sum_{j=1}^M L_{ij}\right). \end{align}\]

A more sophisticated approach is to try to model the reliability of the workers. We suppose that worker $j$ has underlying probability $q_j \in [0,1]$ of accurately labeling a task. So, a worker who is always correct will have $q_j = 1$ and a worker who guesses randomly will have $q_j = \tfrac{1}{2}$. Placing a Beta prior on the reliabilities $q_j$, we can come up with a maximum-likelihood estimator $\hat{q}$ of $q = (q_1, \ldots, q_M)$ given by

\[\begin{align} \hat{q} = \arg\max_q \log p(q \,|\, L) = \arg\max_q \log \sum_z p(q,z \,|\, L). \end{align}\]

Treating $z$ as a hidden variable, we can estimate $\hat{q}$ using expectation maximization (for instance, see Raykar et al., 2010).

Instead of maximizing over $q$, we might hope to integrate it out. Attempting to do so, we can write

\[\begin{align} p(z \,|\, L) \propto \int p(z, q \,|\, L) \, dq = \prod_{j=1}^M \psi_j(z, q, L) \, dq_j, \end{align}\]

where we define

\[\begin{align} \psi_j(z, q, L) = \int p(q_j) \, q_j^{c_j} \, (1-q_j)^{\gamma_j-c_j} \, dq_j. \end{align}\]

For simplicity of notation, we let $\gamma_j$ denote the total number of tasks labeled by worker $j$, and we let $c_j$ denote the total number of tasks correctly labeled by worker $j$. These values depend on $z$ and $L$. Note that $\psi_j$ only depends on the tasks $z_i$ that worker $j$ has labeled. This formulation allows us to reinterpret the problem as inference in a factor graph, where the task labels are the variable nodes and the factors are the functions $\psi_j$. Now, we can apply loopy belief propagation to compute the maximum-likelihood configuration $\hat{z}$.

We chose to place a Beta prior on the reliabilities $q_j$, but we didn't have to do this. Depending on our choice of prior, the factors $\psi_j$ may have closed form integrals or they have to be computed numerically. There is a lot of flexibility in this choice. In fact, it turns out that other algorithms, such as majority voting, are equivalent to using belief propagation on the same factor graph with different priors.

Twitter, Facebook