Summary of the paper ==================== In the context of information diffusion in social networks, the authors consider the problem of estimating the causal effect of a given individual in propagating a given piece of information from a source node to a target node. The main motivation for this (and the one explored in the experiments) is to then rank the nodes in decreasing order of their causal effect and present this as a simple and interpretable summary of the diffusion history. After formally defining the causal effect (the Direct Causal Effect quantity), and observing that it is hard to compute in general, the authors define two auxiliary quantities, responsibility and causality which can be used to approximate the ranking of nodes based on the Direct Causal Effect. Since computing responsibility is itself NP-hard, the authors propose an approximation algorithm based on the approximation algorithm for set cover. Finally experiments validate the approach and evaluate how well the ranking based on DCE can be approximated using the proposed solutions. Comments on the problem/model ============================= A related problem in the information diffusion literature is to estimate the "influence" of a node. Usually, a probabilistic model of diffusion is assumed, the parameters (typically transmission probabilities along the edges) are estimated using data, and then some notion of influence is derived from the estimated parameters. The advantage of causality based approaches is that they typically have much fewer (if any) assumptions on the underlying probabilistic model. However, the approach proposed in this paper seems to fall short of this ideal for the following two reasons: * the definition of Direct Causal Effect is still probabilistic and relies on an implicit probabilistic model. As explained on page 6, line 12, this probabilistic model assumes that edges will fail or succeed to propagate the information with the same probability 1/2 for all edges. This assumption seems unrealistic and overly simplified compared for example to the Independent Cascade Model often used in the information diffusion literature. * it is also surprising to define the causal effect of a node based on a single diffusion. If we think of a given diffusion as a single random realization of a probabilistic process, it seems that a proper notion of "importance" or "causal effect" of a node should be defined in an aggregate sense over many realizations of diffusions. This discrepancy between a single observation and an average over many realizations is made apparent by the chosen definition of Direct Causal Effect, which effectively re-samples new diffusions from a single observed diffusion by activating or deactivating edges randomly. These samples are virtual (they have not been observed in the social network) and basing the definition of the causal effect on them seems questionable. This is in sharp contrast to the way it is done in the standard randomized experiment literature (alluded to a few times by the authors) where a true realization (from the real world) is observed after drawing a random assignment of units to treatment/control. Comments on the results ======================= I think more details should be given about why computing DCE is hard and why it is necessary to approximate it using responsibility and capability. In particular: * even though computing DCE is NP-hard in general (by reduction from some #P-complete problem like counting paths in a graph. It seems that the diffusion history here is a DAG (or maybe even a tree? cf. comment below) where it seems that a polynomial time algorithm to compute DCE in closed form should exist. * even though computing DCE is NP-hard, it can be approximated to an accuracy epsilon using polynomially many samples, by standard Monte-Carlo estimation. Furthermore, there have been a few works in the information diffusion literature to improve the convergence rate of estimators for these types of quantities. See for example: https://arxiv.org/pdf/1207.0913.pdf Hence, it is not totally clear that computing DCE directly (at least in an approximate way) is practically unfeasible. The proposed approximation algorithm for computing responsibility is given without any theoretical guarantee on the quality of the approximation. Given the close similarity with the greedy algorithm for Set Cover, one might expect that a similar log n approximation ratio holds here as well. It would be nice to see this mentioned and formally proved. Furthermore, under an extra assumption like bounded degree of the nodes, which seems to be justified in this application, much better approximation guarantees for set cover are known and should also hold here. Comments on the experiments =========================== The experiments appear sound and well conducted. It is interesting to see a comparison of the different rankings and of how well the DCE ranking can be approximated. An extra experiment which could be nice to have is a comparison of the rankings in this paper to rankings obtained from standard graph theoretic properties, for example node degree, node page-rank, node centrality etc. Without such a comparison, it appears that the proposed approaches are only compared with each other but not to any other "external" independent approach. Comments on the writing/presentation ==================================== The paper is well written and organized overall. Having a toy example in the introduction which is then often referred to in later parts of the paper is very useful and makes the paper much easier to understand. A few things which could maybe be improved: * in section 2, it could be helpful to have a discussion of the model using more of a graph-theoretic language. In particular, as mentioned above, is it the case that a diffusion history always form a DAG or a tree? In other words, do we attribute the exposure of a node to a piece of information to only one neighbor or to multiple neighbors? Can their be cycles in the transmission chain? It also seems that a "trace" is a path in the graph theoretic sense (aka a line), this should be mentioned somewhere. * in definition 3.1, DCE is defined in terms of probabilities, but the source of randomness has not been presented yet at this point. It is only presented in the example in the next page and could be easily overlooked. * the choice of vocabulary "failed" vs "non-failed" for edges could maybe be replaced by "failed" "successful", or simply "absent/present". * the way the probabilities defining DCE are computed is often referred to in the paper as "randomized experiment". As mentioned above, this is a surprising way of calling it since no real experiment is being performed. Maybe this should simply be called Monte Carlo sampling/estimation? Overall comment =============== While the general question of quantifying the ability of nodes to transmit information via a causality based approach is very interesting and original, the way it is being done in this paper suffers from some questionable modeling choices and holes in the analysis.