R2: " The set of parameters \theta always lies in some constrained space. For example, in the independent cascade model, \theta_{i,j} < 0; in the voter model, \sum_{i,j} \theta = 1 and \theta_{i,i} \neq 0.[...] authors would have (norm_1 + regularization induced by constraints), which is not really clear whether is decomposable or not. " This is a great point and we should have been more explicit about this. Overall our results still hold. We need to distinguish between two types of constraints: * the constraints of the type θ_{i,j} < 0, θ_{i,j} ≠ 0. These constraints are already implicitly present in our optimization program: indeed, the log-likelihood function is undefined (or equivalently can be extended to take the value -∞) when these constraints are violated. * the constraint ∑_j θ_j = 1 for the voter model: - We first note that we don't have to enforce this constraint in the optimization program (2): if we solve it without the constraint, the guarantee on the l2 norm (Theorem 2) still applies. The only downside is that the learned parameters might not sum up to one, which is something we might need for applications (e.g. simulations). This is application-dependent and somewhat out of the scope of our paper, but it is easy to prove that if we normalize the learned parameters to sum up to one after solving (2), the l2 guarantee of Theorem 2 looses a multiplicative factor at most √s. - If we know from the beginning that we will need the learned parameters to sum up to one, the constraint can be added to the optimization program. By Lagrangian duality, there exists an augmented objective function (with an additional linear term corresponding to the constraint) such that the maximum of both optimization problems is the same and the solution of the augmented program satisfies the constraint. Theorem 2 applies verbatim to the augmented program and we obtain the same l2 guarantee. " In the independent cascade model, nodes have one chance to infect their neighbors. However, the definition in section 2.2.1. seems to allow for multiple attempts " As the reviewer correctly points out, the standard ICC model does not allow for multiple infection attempts over time. The definition of section 2.2.1 also prohibits multiple attempts by considering that nodes stay active for only one time step, defining X_t as the set of nodes active at the previous time step only, and saying that only nodes which have not been infected before are susceptible to be infected. " in a continuous-time model there are not necessarily steps and property 1 is not necessarily satisfied " This is a good point and the distinction can be made. R3: " multiple sources don't make much of a difference in their model, because [...] if two cascades originate at sources that are more than a constant distance away from each other, it's the same as two consecutive, independent cascades. " This is an interesting point. However, in the problem we study the graph is unknown to us. Suppose that two cascades start at the same time at two very different points in the graph. Despite the fact that the infected nodes from each cascade will not overlap, we cannot in practice attribute an infected node to either cascade because this information is hidden to us. " Running time is not discussed here. " This is a valid point. The MLE algorithm from Netrap.-Sangh. has similar running time to the penalized MLE algorithm. Their greedy algorithm runs considerably faster at the price of a slower convergence rate in practice. A precise comparison of running times can be be included. " The inference in discrete time, one-time-susceptible contagion processes is less interesting and easier than the continuos version. " This is an interesting point. We note that the generalized cascade model class is sufficiently flexible to include multiple-time-susceptible contagion processes (such as the linear voter model). Furthermore, it is not immediately clear that discrete-time processes cannot approximate some continuous time processes efficiently. For example, we can discretize the continuous time process with exponential transmission likelihood by considering intervals of time of length dt, binning infections to these intervals, and considering that nodes remain infected until the final observation time. By exploiting the memoryless property of the exponential distribution, we recover its discrete-time analog, the geometric distribution. When dt<<1, the problem is still decomposable and fits into the Generalized Linear Cascade model framework. " the unrealistic assumption of one time infection chance. " It is true that the standard discrete-time independent cascade model studied by Netrapalli-Sanghavi assumes one time infection chance but this restriction is not made at the Generalized Linear Cascade level and is specific only to the example of section 2.2.1 R4: " what would be the guaranteed/expected performance given some number of cascades? " This is an interesting point. For the experiment section, we could calculate the theoretical guarantees for the synthetic graphs and observe whether or not the theoretical bounds are pessimistic in practice. " Where is the explanation about Figure 1(f)? What is p_init? " This is a typo. It should read "n" the number of cascades. " The authors need to show at least one common metric for all types for graphs " This is an interesting point: A graph plotting the same metric for all considered networks can replace one of the 6 figures. Misc. " Citations/Related work remarks " The requested citations can be included on lines 42, 68, 75, 78, 93, 362. The authors regret not to have cited Du et al. 2012 and their work should be included in the related work section along with other work considering the estimation of influence in networks. It can be mentioned that Daneshmand et al adopt the same model as Gomez-R et al '10 and Abrahao et al. '13. The phrasing can be changed from "Graph Inference" to "Network Inference" with the requested citations.