Significance: 5
Originality: 4
Relevance: 7
Technical quality: 3
Presentation quality: 7
Scholarship:
Confidence: 6
Overall Recommendation: 5


Comment to the authors:

Summary of the paper
====================

This paper studies the problem of aggregating distributions from workers. For
example, an experimenter wants to use crowdsourcing to categorize a set of
documents: each document's topic is represented as a distribution over
J topics which the experimenter wishes to learn.

The authors note that previous approaches:

* either do not take into account the inherent quality and/or bias of the
different workers (for example LinOp, LogOp)

* do not aggregate distributions over topics but only elicit a single category
from the workers for the documents they are assigned to.

The model proposed by the authors aims to achieve the best of both worlds: the
bias/quality of each worker is captured through a confusion matrix which is
a JxJ matrix whose column i is a distribution over the J categories expressing
how the worker might miscategorise category i. The distribution elicitated from
the worker is then simply the vector-matrix product between the ground truth
distribution of the document and the confusion matrix of the worker.

A standard Bayesian inference approach is then applied: the posterior
predictive distributive of the joint distribution of (document topics, worker
confusion matrices) is estimated via the EP algorithm.

Experimental results confirm the validity of the approach. Workers biases are
simulated by introducing synthetic "spammers" in the dataset. Experiments show
that this approach compares favorably to prior approaches.

Comments
========

The problem studied is very interesting and arguably one of the central
problems in crowdsourcing: even though the paper is presented through the lens
of document categorization, the abstract task and model used could capture many
other applications. It is also clear that trying to model the bias of workers
is crucial for most, if not all, crowdsourcing applications.

The proposed model addresses the shortcomings of previous models that are
clearly identified by the authors. The proposed model can be seen as a natural
extension of them. A possible concern is the one of practicality: the success
of crowdsourcing platforms relies on being able to offer simple tasks to
workers. In that respect, asking the user to report a single category for each
document seems much simpler than asking them to report a distribution over
topics. This problem is mentioned in the conclusion of the paper: it seems that
the gain in expressivity could be outweighed by the additional noise introduced
by asking the workers to perform a more complex task.

The paper is well and very clearly written overall. I only found one thing
confusing: it is not clear from the formal description of the Bayesian model,
and in particular Figure 1, that what is elicitated from the workers is
a distribution over topics: the way it is written c_{i,n} is a categorical
variable. The distribution of c_{i,n} corresponds to the product \Lambda_i \Pi
introduced in the previous page, but then it is not clear if what is being
reported by the workers is c_{i,n} itself or its distribution. I think this
should be clarified in future versions of this paper.

The experiments are sound and establish a clear (favorable) comparison between
the proposed approach and prior work. One reason for concern here is the
problem of over-fitting: it seems that the model proposed in this paper has more
parameters than any of the previously suggested models. Given the experimental
setting, it is not clear to me how this effect should be quantifying, but it
would be interesting to see an experiment discussing the trade-off between
model complexity and generalization error.