summaryrefslogtreecommitdiffstats
path: root/tsc-2016-0023.txt
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2017-04-15 18:41:19 -0400
committerThibaut Horel <thibaut.horel@gmail.com>2017-04-15 18:41:19 -0400
commit3774021b9db9bbe9eb674edfdb19a74f9ac16021 (patch)
tree445c24bae4549159296a5869bcb4e91754d6302f /tsc-2016-0023.txt
parent71f7920d6918d7bf680f8c0c01b434d25ab1bbcb (diff)
downloadreviews-3774021b9db9bbe9eb674edfdb19a74f9ac16021.tar.gz
TSC review
Diffstat (limited to 'tsc-2016-0023.txt')
-rw-r--r--tsc-2016-0023.txt134
1 files changed, 134 insertions, 0 deletions
diff --git a/tsc-2016-0023.txt b/tsc-2016-0023.txt
new file mode 100644
index 0000000..e76f0ca
--- /dev/null
+++ b/tsc-2016-0023.txt
@@ -0,0 +1,134 @@
+Summary of the paper
+====================
+
+In the context of information diffusion in social networks, the authors
+consider the problem of estimating the causal effect of a given individual in
+propagating a given piece of information from a source node to a target node.
+The main motivation for this (and the one explored in the experiments) is to
+then rank the nodes in decreasing order of their causal effect and present this
+as a simple and interpretable summary of the diffusion history. After formally
+defining the causal effect (the Direct Causal Effect quantity), and observing
+that it is hard to compute in general, the authors define two auxiliary
+quantities, responsibility and causality which can be used to approximate the
+ranking of nodes based on the Direct Causal Effect. Since computing
+responsibility is itself NP-hard, the authors propose an approximation
+algorithm based on the approximation algorithm for set cover. Finally
+experiments validate the approach and evaluate how well the ranking based on
+DCE can be approximated using the proposed solutions.
+
+Comments on the problem/model
+=============================
+
+A related problem in the information diffusion literature is to estimate the
+"influence" of a node. Usually, a probabilistic model of diffusion is assumed,
+the parameters (typically transmission probabilities along the edges) are
+estimated using data, and then some notion of influence is derived from the
+estimated parameters.
+
+The advantage of causality based approaches is that they typically have much
+fewer (if any) assumptions on the underlying probabilistic model. However, the
+approach proposed in this paper seems to fall short of this ideal for the
+following two reasons:
+
+* the definition of Direct Causal Effect is still probabilistic and relies on
+ an implicit probabilistic model. As explained on page 6, line 12, this
+ probabilistic model assumes that edges will fail or succeed to propagate the
+ information with the same probability 1/2 for all edges. This assumption
+ seems unrealistic and overly simplified compared for example to the
+ Independent Cascade Model often used in the information diffusion literature.
+
+* it is also surprising to define the causal effect of a node based on a single
+ diffusion. If we think of a given diffusion as a single random realization of
+ a probabilistic process, it seems that a proper notion of "importance" or
+ "causal effect" of a node should be defined in an aggregate sense over many
+ realizations of diffusions. This discrepancy between a single observation and
+ an average over many realizations is made apparent by the chosen definition
+ of Direct Causal Effect, which effectively re-samples new diffusions from
+ a single observed diffusion by activating or deactivating edges randomly.
+ These samples are virtual (they have not been observed in the social network)
+ and basing the definition of the causal effect on them seems questionable.
+ This is in sharp contrast to the way it is done in the standard randomized
+ experiment literature (alluded to a few times by the authors) where a true
+ realization (from the real world) is observed after drawing a random
+ assignment of units to treatment/control.
+
+Comments on the results
+=======================
+
+I think more details should be given about why computing DCE is hard and why it
+is necessary to approximate it using responsibility and capability. In
+particular:
+
+* even though computing DCE is NP-hard in general (by reduction from some
+ #P-complete problem like counting paths in a graph. It seems that the
+ diffusion history here is a DAG (or maybe even a tree? cf. comment below)
+ where it seems that a polynomial time algorithm to compute DCE in closed form
+ should exist.
+
+* even though computing DCE is NP-hard, it can be approximated to an accuracy
+ epsilon using polynomially many samples, by standard Monte-Carlo estimation.
+ Furthermore, there have been a few works in the information diffusion
+ literature to improve the convergence rate of estimators for these types of
+ quantities. See for example: https://arxiv.org/pdf/1207.0913.pdf
+
+Hence, it is not totally clear that computing DCE directly (at least in an
+approximate way) is practically unfeasible.
+
+The proposed approximation algorithm for computing responsibility is given
+without any theoretical guarantee on the quality of the approximation. Given
+the close similarity with the greedy algorithm for Set Cover, one might expect
+that a similar log n approximation ratio holds here as well. It would be nice
+to see this mentioned and formally proved. Furthermore, under an extra
+assumption like bounded degree of the nodes, which seems to be justified in
+this application, much better approximation guarantees for set cover are known
+and should also hold here.
+
+Comments on the experiments
+===========================
+
+The experiments appear sound and well conducted. It is interesting to see
+a comparison of the different rankings and of how well the DCE ranking can be
+approximated.
+
+An extra experiment which could be nice to have is a comparison of the rankings
+in this paper to rankings obtained from standard graph theoretic properties,
+for example node degree, node page-rank, node centrality etc. Without such
+a comparison, it appears that the proposed approaches are only compared with
+each other but not to any other "external" independent approach.
+
+Comments on the writing/presentation
+====================================
+
+The paper is well written and organized overall. Having a toy example in the
+introduction which is then often referred to in later parts of the paper is
+very useful and makes the paper much easier to understand.
+
+A few things which could maybe be improved:
+
+* in section 2, it could be helpful to have a discussion of the model using
+ more of a graph-theoretic language. In particular, as mentioned above, is it
+ the case that a diffusion history always form a DAG or a tree? In other
+ words, do we attribute the exposure of a node to a piece of information to
+ only one neighbor or to multiple neighbors? Can their be cycles in the
+ transmission chain? It also seems that a "trace" is a path in the graph
+ theoretic sense (aka a line), this should be mentioned somewhere.
+
+* in definition 3.1, DCE is defined in terms of probabilities, but the source
+ of randomness has not been presented yet at this point. It is only presented
+ in the example in the next page and could be easily overlooked.
+
+* the choice of vocabulary "failed" vs "non-failed" for edges could maybe be
+ replaced by "failed" "successful", or simply "absent/present".
+
+* the way the probabilities defining DCE are computed is often referred to in
+ the paper as "randomized experiment". As mentioned above, this is
+ a surprising way of calling it since no real experiment is being performed.
+ Maybe this should simply be called Monte Carlo sampling/estimation?
+
+Overall comment
+===============
+
+While the general question of quantifying the ability of nodes to transmit
+information via a causality based approach is very interesting and original,
+the way it is being done in this paper suffers from some questionable modeling
+choices and holes in the analysis.