summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--ecai-2016-0657.txt78
1 files changed, 78 insertions, 0 deletions
diff --git a/ecai-2016-0657.txt b/ecai-2016-0657.txt
new file mode 100644
index 0000000..55fd273
--- /dev/null
+++ b/ecai-2016-0657.txt
@@ -0,0 +1,78 @@
+Significance: 5
+Originality: 4
+Relevance: 7
+Technical quality: 3
+Presentation quality: 7
+Scholarship:
+Confidence: 6
+Overall Recommendation: 5
+
+
+Comment to the authors:
+
+Summary of the paper
+====================
+
+This paper studies the problem of aggregating distributions from workers. For
+example, an experimenter wants to use crowdsourcing to categorize a set of
+documents: each document's topic is represented as a distribution over
+J topics which the experimenter wishes to learn.
+
+The authors note that previous approaches:
+
+* either do not take into account the inherent quality and/or bias of the
+different workers (for example LinOp, LogOp)
+
+* do not aggregate distributions over topics but only elicit a single category
+from the workers for the documents they are assigned to.
+
+The model proposed by the authors aims to achieve the best of both worlds: the
+bias/quality of each worker is captured through a confusion matrix which is
+a JxJ matrix whose column i is a distribution over the J categories expressing
+how the worker might miscategorise category i. The distribution elicitated from
+the worker is then simply the vector-matrix product between the ground truth
+distribution of the document and the confusion matrix of the worker.
+
+A standard Bayesian inference approach is then applied: the posterior
+predictive distributive of the joint distribution of (document topics, worker
+confusion matrices) is estimated via the EP algorithm.
+
+Experimental results confirm the validity of the approach. Workers biases are
+simulated by introducing synthetic "spammers" in the dataset. Experiments show
+that this approach compares favorably to prior approaches.
+
+Comments
+========
+
+The problem studied is very interesting and arguably one of the central
+problems in crowdsourcing: even though the paper is presented through the lens
+of document categorization, the abstract task and model used could capture many
+other applications. It is also clear that trying to model the bias of workers
+is crucial for most, if not all, crowdsourcing applications.
+
+The proposed model addresses the shortcomings of previous models that are
+clearly identified by the authors. The proposed model can be seen as a natural
+extension of them. A possible concern is the one of practicality: the success
+of crowdsourcing platforms relies on being able to offer simple tasks to
+workers. In that respect, asking the user to report a single category for each
+document seems much simpler than asking them to report a distribution over
+topics. This problem is mentioned in the conclusion of the paper: it seems that
+the gain in expressivity could be outweighed by the additional noise introduced
+by asking the workers to perform a more complex task.
+
+The paper is well and very clearly written overall. I only found one thing
+confusing: it is not clear from the formal description of the Bayesian model,
+and in particular Figure 1, that what is elicitated from the workers is
+a distribution over topics: the way it is written c_{i,n} is a categorical
+variable. The distribution of c_{i,n} corresponds to the product \Lambda_i \Pi
+introduced in the previous page, but then it is not clear if what is being
+reported by the workers is c_{i,n} itself or its distribution. I think this
+should be clarified in future versions of this paper.
+
+The experiments are sound and establish a clear (favorable) comparison between
+the proposed approach and prior work. One reason for concern here is the
+problem of over-fitting: it seems that the model proposed in this paper has more
+parameters than any of the previously suggested models. Given the experimental
+setting, it is not clear to me how this effect should be quantifying, but it
+would be interesting to see an experiment discussing the trade-off between
+model complexity and generalization error.