R2:
"
The set of parameters \theta always lies in some constrained space. For example,
in the independent cascade model, \theta_{i,j} < 0; in the voter model,
\sum_{i,j} \theta = 1 and \theta_{i,i} \neq 0.[...] authors would have (norm_1 +
regularization induced by constraints), which is not really clear whether is
decomposable or not.
"

This is a great point and we should have been more explicit about this. Overall
our results still hold. We need to distinguish between two types of
constraints:

* the constraints of the type θ_{i,j} < 0, θ_{i,j} ≠ 0. These constraints are
  already implicitly present in our optimization program: indeed, the
  log-likelihood function is undefined (or equivalently can be extended to take
  the value -∞) when these constraints are violated.

* the constraint ∑_j θ_j = 1 for the voter model:

    - We first note that we don't have to enforce this constraint in the
      optimization program (2): if we solve it without the constraint, the
      guarantee on the l2 norm (Theorem 2) still applies. The only downside is
      that the learned parameters might not sum up to one, which is something
      we might need for applications (e.g. simulations).  This is
      application-dependent and somewhat out of the scope of our paper, but it
      is easy to prove that if we normalize the learned parameters to sum up to
      one after solving (2), the l2 guarantee of Theorem 2 looses
      a multiplicative factor at most √s.

    - If we know from the beginning that we will need the learned parameters to
      sum up to one, the constraint can be added to the optimization program.
      By Lagrangian duality, there exists an augmented objective function (with
      an additional linear term corresponding to the constraint) such that the
      maximum of both optimization problems is the same and the solution of the
      augmented program satisfies the constraint. Theorem 2 applies verbatim to
      the augmented program and we obtain the same l2 guarantee.

"
In the independent cascade model, nodes have one chance to infect their
neighbors. However, the definition in section 2.2.1. seems to allow for multiple
attempts
"
As the reviewer correctly points out, the standard ICC model does not allow
for multiple infection attempts over time. The definition of section 2.2.1 also
prohibits multiple attempts by considering that nodes stay active for only one
time step, defining X_t as the set of nodes active at the previous time step
only, and saying that only nodes which have not been infected before are
susceptible to be infected.

"
in a continuous-time model there are not necessarily steps and property 1 is not
necessarily satisfied
"
This is a good point and the distinction can be made.

R3:
"
multiple sources don't make much of a difference in their model, because [...]
if two cascades originate at sources that are more than a constant distance away
from each other, it's the same as two consecutive, independent cascades.
"
This is an interesting point. However, in the problem we study the graph is
unknown to us. Suppose that two cascades start at the same time at two very
different points in the graph. Despite the fact that the infected nodes from
each cascade will not overlap, we cannot in practice attribute an infected node
to either cascade because this information is hidden to us.

"
Running time is not discussed here.
"
This is a valid point. The MLE algorithm from Netrap.-Sangh. has similar running
time to the penalized MLE algorithm. Their greedy algorithm runs considerably
faster at the price of a slower convergence rate in practice. A precise
comparison of running times can be be included.

"
The inference in discrete time, one-time-susceptible contagion
processes is less interesting and easier than the continuos version.
"
This is an interesting point. We note that the generalized cascade model class
is sufficiently flexible to include multiple-time-susceptible contagion
processes (such as the linear voter model). Furthermore, it is not immediately
clear that discrete-time processes cannot approximate some continuous time
processes efficiently. For example, we can discretize the continuous time
process with exponential transmission likelihood by considering intervals of
time of length dt, binning infections to these intervals, and considering that
nodes remain infected until the final observation time. By exploiting the
memoryless property of the exponential distribution, we recover its
discrete-time analog, the geometric distribution. When dt<<1, the problem is
still decomposable and fits into the Generalized Linear Cascade model framework.

"
the unrealistic assumption of one time infection chance.
"
It is true that the standard discrete-time independent cascade model
studied by Netrapalli-Sanghavi assumes one time infection chance but this
restriction is not made at the Generalized Linear Cascade level and is specific
only to the example of section 2.2.1

R4:
"
what would be the guaranteed/expected performance given some number of
cascades?
"
This is an interesting point. For the experiment section, we could calculate the
theoretical guarantees for the synthetic graphs and observe whether or not the
theoretical bounds are pessimistic in practice.

"
Where is the explanation about Figure 1(f)? What is p_init?
"
This is a typo. It should read "n" the number of cascades.

"
The authors need to show at least one common metric for all types for graphs
"
This is an interesting point: A graph plotting the same metric for all
considered networks can replace one of the 6 figures.

Misc.
"
Citations/Related work remarks
"
The requested citations can be included on lines 42, 68, 75, 78, 93, 362. The
authors regret not to have cited Du et al. 2012 and their work should be
included in the related work section along with other work considering the
estimation of influence in networks. It can be mentioned that Daneshmand et al
adopt the same model as Gomez-R et al '10 and Abrahao et al. '13.  The phrasing
can be changed from "Graph Inference" to "Network Inference" with the requested
citations.