Summary of the paper
====================

In the context of information diffusion in social networks, the authors
consider the problem of estimating the causal effect of a given individual in
propagating a given piece of information from a source node to a target node.
The main motivation for this (and the one explored in the experiments) is to
then rank the nodes in decreasing order of their causal effect and present this
as a simple and interpretable summary of the diffusion history. After formally
defining the causal effect (the Direct Causal Effect quantity), and observing
that it is hard to compute in general, the authors define two auxiliary
quantities, responsibility and causality which can be used to approximate the
ranking of nodes based on the Direct Causal Effect. Since computing
responsibility is itself NP-hard, the authors propose an approximation
algorithm based on the approximation algorithm for set cover. Finally
experiments validate the approach and evaluate how well the ranking based on
DCE can be approximated using the proposed solutions.

Comments on the problem/model
=============================

A related problem in the information diffusion literature is to estimate the
"influence" of a node. Usually, a probabilistic model of diffusion is assumed,
the parameters (typically transmission probabilities along the edges) are
estimated using data, and then some notion of influence is derived from the
estimated parameters.

The advantage of causality based approaches is that they typically have much
fewer (if any) assumptions on the underlying probabilistic model. However, the
approach proposed in this paper seems to fall short of this ideal for the
following two reasons:

* the definition of Direct Causal Effect is still probabilistic and relies on
  an implicit probabilistic model. As explained on page 6, line 12, this
  probabilistic model assumes that edges will fail or succeed to propagate the
  information with the same probability 1/2 for all edges. This assumption
  seems unrealistic and overly simplified compared for example to the
  Independent Cascade Model often used in the information diffusion literature.

* it is also surprising to define the causal effect of a node based on a single
  diffusion. If we think of a given diffusion as a single random realization of
  a probabilistic process, it seems that a proper notion of "importance" or
  "causal effect" of a node should be defined in an aggregate sense over many
  realizations of diffusions. This discrepancy between a single observation and
  an average over many realizations is made apparent by the chosen definition
  of Direct Causal Effect, which effectively re-samples new diffusions from
  a single observed diffusion by activating or deactivating edges randomly.
  These samples are virtual (they have not been observed in the social network)
  and basing the definition of the causal effect on them seems questionable.
  This is in sharp contrast to the way it is done in the standard randomized
  experiment literature (alluded to a few times by the authors) where a true
  realization (from the real world) is observed after drawing a random
  assignment of units to treatment/control.

Comments on the results
=======================

I think more details should be given about why computing DCE is hard and why it
is necessary to approximate it using responsibility and capability. In
particular:

* even though computing DCE is NP-hard in general (by reduction from some
  #P-complete problem like counting paths in a graph. It seems that the
  diffusion history here is a DAG (or maybe even a tree? cf. comment below)
  where it seems that a polynomial time algorithm to compute DCE in closed form
  should exist.

* even though computing DCE is NP-hard, it can be approximated to an accuracy
  epsilon using polynomially many samples, by standard Monte-Carlo estimation.
  Furthermore, there have been a few works in the information diffusion
  literature to improve the convergence rate of estimators for these types of
  quantities. See for example: https://arxiv.org/pdf/1207.0913.pdf 

Hence, it is not totally clear that computing DCE directly (at least in an
approximate way) is practically unfeasible.

The proposed approximation algorithm for computing responsibility is given
without any theoretical guarantee on the quality of the approximation. Given
the close similarity with the greedy algorithm for Set Cover, one might expect
that a similar log n approximation ratio holds here as well. It would be nice
to see this mentioned and formally proved. Furthermore, under an extra
assumption like bounded degree of the nodes, which seems to be justified in
this application, much better approximation guarantees for set cover are known
and should also hold here.

Comments on the experiments
===========================

The experiments appear sound and well conducted. It is interesting to see
a comparison of the different rankings and of how well the DCE ranking can be
approximated.

An extra experiment which could be nice to have is a comparison of the rankings
in this paper to rankings obtained from standard graph theoretic properties,
for example node degree, node page-rank, node centrality etc. Without such
a comparison, it appears that the proposed approaches are only compared with
each other but not to any other "external" independent approach.

Comments on the writing/presentation
====================================

The paper is well written and organized overall. Having a toy example in the
introduction which is then often referred to in later parts of the paper is
very useful and makes the paper much easier to understand.

A few things which could maybe be improved:

* in section 2, it could be helpful to have a discussion of the model using
  more of a graph-theoretic language. In particular, as mentioned above, is it
  the case that a diffusion history always form a DAG or a tree? In other
  words, do we attribute the exposure of a node to a piece of information to
  only one neighbor or to multiple neighbors? Can their be cycles in the
  transmission chain? It also seems that a "trace" is a path in the graph
  theoretic sense (aka a line), this should be mentioned somewhere.

* in definition 3.1, DCE is defined in terms of probabilities, but the source
  of randomness has not been presented yet at this point. It is only presented
  in the example in the next page and could be easily overlooked.

* the choice of vocabulary "failed" vs "non-failed" for edges  could maybe be
  replaced by "failed" "successful", or simply "absent/present".

* the way the probabilities defining DCE are computed is often referred to in
  the paper as "randomized experiment". As mentioned above, this is
  a surprising way of calling it since no real experiment is being performed.
  Maybe this should simply be called Monte Carlo sampling/estimation?

Overall comment
===============

While the general question of quantifying the ability of nodes to transmit
information via a causality based approach is very interesting and original,
the way it is being done in this paper suffers from some questionable modeling
choices and holes in the analysis.