Significance: 5 Originality: 4 Relevance: 7 Technical quality: 3 Presentation quality: 7 Scholarship: Confidence: 6 Overall Recommendation: 5 Comment to the authors: Summary of the paper ==================== This paper studies the problem of aggregating distributions from workers. For example, an experimenter wants to use crowdsourcing to categorize a set of documents: each document's topic is represented as a distribution over J topics which the experimenter wishes to learn. The authors note that previous approaches: * either do not take into account the inherent quality and/or bias of the different workers (for example LinOp, LogOp) * do not aggregate distributions over topics but only elicit a single category from the workers for the documents they are assigned to. The model proposed by the authors aims to achieve the best of both worlds: the bias/quality of each worker is captured through a confusion matrix which is a JxJ matrix whose column i is a distribution over the J categories expressing how the worker might miscategorise category i. The distribution elicitated from the worker is then simply the vector-matrix product between the ground truth distribution of the document and the confusion matrix of the worker. A standard Bayesian inference approach is then applied: the posterior predictive distributive of the joint distribution of (document topics, worker confusion matrices) is estimated via the EP algorithm. Experimental results confirm the validity of the approach. Workers biases are simulated by introducing synthetic "spammers" in the dataset. Experiments show that this approach compares favorably to prior approaches. Comments ======== The problem studied is very interesting and arguably one of the central problems in crowdsourcing: even though the paper is presented through the lens of document categorization, the abstract task and model used could capture many other applications. It is also clear that trying to model the bias of workers is crucial for most, if not all, crowdsourcing applications. The proposed model addresses the shortcomings of previous models that are clearly identified by the authors. The proposed model can be seen as a natural extension of them. A possible concern is the one of practicality: the success of crowdsourcing platforms relies on being able to offer simple tasks to workers. In that respect, asking the user to report a single category for each document seems much simpler than asking them to report a distribution over topics. This problem is mentioned in the conclusion of the paper: it seems that the gain in expressivity could be outweighed by the additional noise introduced by asking the workers to perform a more complex task. The paper is well and very clearly written overall. I only found one thing confusing: it is not clear from the formal description of the Bayesian model, and in particular Figure 1, that what is elicitated from the workers is a distribution over topics: the way it is written c_{i,n} is a categorical variable. The distribution of c_{i,n} corresponds to the product \Lambda_i \Pi introduced in the previous page, but then it is not clear if what is being reported by the workers is c_{i,n} itself or its distribution. I think this should be clarified in future versions of this paper. The experiments are sound and establish a clear (favorable) comparison between the proposed approach and prior work. One reason for concern here is the problem of over-fitting: it seems that the model proposed in this paper has more parameters than any of the previously suggested models. Given the experimental setting, it is not clear to me how this effect should be quantifying, but it would be interesting to see an experiment discussing the trade-off between model complexity and generalization error.