Dealing with Reliability when Crowdsourcing

Robert NishiharaMachine Learning, StatisticsLeave a Comment

I recently read the paper “Variational Inference for Crowdsourcing,” by Qiang Liu, Jian Peng, and Alexander Ihler. They present an approach using belief propagation to deal with reliability when using crowdsourcing to collect labeled data. This post is based on their exposition. Crowdsourcing (via services such as Amazon Mechanical Turk) has been used as a cheap way to amass large quantities of labeled data. However, the labels are likely to be noisy.

To deal with this, a common strategy is to employ redundancy: each task is labeled by multiple workers. For simplicity, suppose there are N tasks and M workers, and assume that the possible labels are \{\pm 1\}. Define the N \times M matrix L so that L_{ij} is the label given to task i by worker j (or 0 if no label was provided). Let z = (z_1, \ldots, z_N) be the true labels of the tasks. Given L, we wish to come up with an estimator \hat{z} of z so as to minimize the average error

(1)   \begin{align*} \frac{1}{N} \sum_{i=1}^N \text{prob}[\hat{z}_i \ne z_i] . \end{align*}

The simplest approach is to go with the majority. More precisely, we can choose

(2)   \begin{align*} \hat{z}_i = \text{sign}\left(\sum_{j=1}^M L_{ij}\right) . \end{align*}

A more sophisticated approach is to try to model the reliability of the workers. We suppose that worker j has underlying probability q_j \in [0,1] of accurately labeling a task. So, a worker who is always correct will have q_j = 1 and a worker who guesses randomly will have q_j = \tfrac{1}{2}. Placing a Beta prior on the reliabilities q_j, we can come up with a maximum-likelihood estimator \hat{q} of q = (q_1, \ldots, q_M) given by

(3)   \begin{align*} \hat{q} = \arg\max_q \log p(q \,|\, L) = \arg\max_q \log \sum_z p(q,z \,|\, L) . \end{align*}

Treating z as a hidden variable, we can estimate \hat{q} using expectation maximization (for instance, see Raykar et al., 2010).

Instead of maximizing over q, we might hope to integrate it out. Attempting to do so, we can write

(4)   \begin{align*} p(z \,|\, L) \propto \int p(z, q \,|\, L) \, dq = \prod_{j=1}^M \psi_j(z, q, L) \, dq_j , \end{align*}

where we define

(5)   \begin{align*} \psi_j(z, q, L) = \int p(q_j) \, q_j^{c_j} \, (1-q_j)^{\gamma_j-c_j} \, dq_j . \end{align*}

For simplicity of notation, we let \gamma_j denote the total number of tasks labeled by worker j, and we let c_j denote the total number of tasksĀ correctly labeled by worker j. These values depend on z and L. Note that \psi_j only depends on the tasks z_i that worker j has labeled. This formulation allows us to reinterpret the problem as inference in a factor graph, where the task labels are the variable nodes and the factors are the functions \psi_j. Now, we can apply loopy belief propagation to compute the maximum-likelihood configuration \hat{z}.

We chose to place a Beta prior on the reliabilities q_j, but we didn’t have to do this. Depending on our choice of prior, the factors \psi_j may have closed form integrals or they have to be computed numerically. There is a lot of flexibility in this choice. In fact, it turns out that other algorithms, such as majority voting, are equivalent to using belief propagation on the same factor graph with different priors.

Leave a Reply