1 files changed, 18 insertions, 81 deletions
diff --git a/paper/rebuttal.txt b/paper/rebuttal.txt
index 2ab98e2..1e182b4 100644
--- a/paper/rebuttal.txt
+++ b/paper/rebuttal.txt
@@ -1,97 +1,34 @@
-We would like to thank the reviewers for carefully reading our paper and their
-insightful remarks. We have tried to address the main points of contention
-below:
+We would like to thank the reviewers for carefully reading our paper and their insightful remarks. We have tried to address the main points of contention below:
 
 R2:
-"
-The set of parameters \theta always lies in some constrained space. For
-example, in the independent cascade model, θ_{i,j} < 0; in the voter model,
-∑_{i,j} θ_{i,i} = 1 and θ_{i,i}≠0.[...] authors would have (norm_1 +
-regularization induced by constraints), which is not really clear whether is
-decomposable or not.
-"
+" The set of parameters \theta always lies in some constrained space. For example, in the independent cascade model, θ_{i,j} < 0; in the voter model, ∑_{i,j} θ_{i,i} = 1 and θ_{i,i}≠0.[...] authors would have (norm_1 + regularization induced by constraints), which is not really clear whether is decomposable or not. " 
 
-This is a great point and we should have been more explicit about this. Overall
-our results still hold. We need to distinguish between two types of
-constraints:
+This is a great point and we should have been more explicit about this. Overall our results still hold. We need to distinguish between two types of constraints:
 
-* the constraints of the type θ_{i,j} < 0, θ_{i,j} ≠ 0. These constraints are
-  already implicitly present in our optimization program: indeed, the
-  log-likelihood function is undefined (or equivalently can be extended to take
-  the value -∞) when these constraints are violated.
+* the constraints of the type θ_{i,j} < 0, θ_{i,j} ≠ 0. These constraints are already implicitly present in our optimization program: indeed, the log-likelihood function is undefined (or equivalently can be extended to take the value -∞) when these constraints are violated.
 
 * the constraint ∑_j θ_j = 1 for the voter model:
 
-    - We first note that we don't have to enforce this constraint in the
-      optimization program (2): if we solve it without the constraint, the
-      guarantee on the l2 norm (Theorem 2) still applies. The only downside is
-      that the learned parameters might not sum up to one, which is something
-      we might need for applications (e.g. simulations).  This is
-      application-dependent and somewhat out of the scope of our paper, but it
-      is easy to prove that if we normalize the learned parameters to sum up to
-      one after solving (2), the l2 guarantee of Theorem 2 loses
-      a multiplicative factor at most √s.
+    - We first note that we don't have to enforce this constraint in the optimization program (2): if we solve it without the constraint, the guarantee on the l2 norm (Theorem 2) still applies. The only downside is that the learned parameters might not sum up to one, which is something we might need for applications (e.g. simulations).  This is application-dependent and somewhat out of the scope of our paper, but it is easy to prove that if we normalize the learned parameters to sum up to one after solving (2), the l2 guarantee of Theorem 2 loses a multiplicative factor at most √s.
 
-    - If we know from the beginning that we will need the learned parameters to
-      sum up to one, the constraint can be added to the optimization program.
-      By Lagrangian duality, there exists an augmented objective function (with
-      an additional linear term corresponding to the constraint) such that the
-      maximum of both optimization problems is the same and the solution of the
-      augmented program satisfies the constraint. Theorem 2 applies verbatim to
-      the augmented program and we obtain the same l2 guarantee.
+    - If we know from the beginning that we will need the learned parameters to sum up to one, the constraint can be added to the optimization program.  By Lagrangian duality, there exists an augmented objective function (with an additional linear term corresponding to the constraint) such that the maximum of both optimization problems is the same and the solution of the augmented program satisfies the constraint. Theorem 2 applies verbatim to the augmented program and we obtain the same l2 guarantee.
 
-"
-In the independent cascade model, nodes have one chance to infect their
-neighbors. However, the definition in section 2.2.1. seems to allow for multiple
-attempts
-"
-As the reviewer correctly points out, the standard ICC model does not allow
-for multiple infection attempts over time. The definition of section 2.2.1 also
-prohibits multiple attempts by considering that nodes stay active for only one
-time step and saying that only nodes which have not been infected before are
-susceptible to be infected.
+" In the independent cascade model, nodes have one chance to infect their neighbors. However, the definition in section 2.2.1. seems to allow for multiple attempts "
+
+As the reviewer correctly points out, the standard ICC model does not allow for multiple infection attempts over time. The definition of section 2.2.1 also prohibits multiple attempts by considering that nodes stay active for only one time step and saying that only nodes which have not been infected before are susceptible to be infected.
 
 R3:
-"
-multiple sources don't make much of a difference in their model, because [...]
-it's the same as two consecutive, independent cascades.
-"
-This is an interesting point. However, in the problem we study the graph is
-unknown to us. Suppose that two cascades start at the same time at two very
-different points in the graph. Despite the fact that the infected nodes from
-each cascade will not overlap, we cannot in practice attribute an infected node
-to either cascade because we don't know which source is closer to it.
+" multiple sources don't make much of a difference in their model, because [...] it's the same as two consecutive, independent cascades. "
+
+This is an interesting point. However, in the problem we study the graph is unknown to us. Suppose that two cascades start at the same time at two very different points in the graph. Despite the fact that the infected nodes from each cascade will not overlap, we cannot in practice attribute an infected node to either cascade because we don't know which source is closer to it.
+
+" The inference in discrete time, one-time-susceptible contagion processes is less interesting and easier than the continuous version. "
 
-"
-The inference in discrete time, one-time-susceptible contagion
-processes is less interesting and easier than the continuous version.
-"
-This is an interesting point. We note that the generalized cascade model class
-is sufficiently flexible to include multiple-time-susceptible contagion
-processes (such as the linear voter model). Furthermore, it is not immediately
-clear that discrete-time processes cannot approximate some continuous time
-processes efficiently. For example, we can discretize the continuous time
-process of Gomez-Rodriguez et al. 2011 with exponential transmission likelihood
-by binning infections to regular intervals of size dt. By the exponential
-distribution's memorylessness, it can be shown that when dt<<1, the problem is
-still decomposable and fits into the Generalized Linear Cascade model
-framework.
+This is an interesting point. We note that the generalized cascade model class is sufficiently flexible to include multiple-time-susceptible contagion processes (such as the linear voter model). Furthermore, it is not immediately clear that discrete-time processes cannot approximate some continuous time processes efficiently. For example, we can discretize the continuous time process of Gomez-Rodriguez et al. 2011 with exponential transmission likelihood by binning infections to regular intervals of size dt. By the exponential distribution's memorylessness, it can be shown that when dt<<1, the problem is still decomposable and fits into the Generalized Linear Cascade model framework.
 
-We agree with the remark that the definition of cascade models should specify
-that this holds only for discrete-time cascades, and that running time
-comparison of different algorithms should be added.
+We agree with the remark that the definition of cascade models should specify that this holds only for discrete-time cascades, and that running time comparison of different algorithms should be added.
 
 R4:
-Reviewer 4 raised the point of showing expected performance given some
-number of cascades, which could be added to the existing simulations.
-Figure 1(f) does have a typo, it should read "n" the number of cascades.
-Finally, we agree that showing "at least one common metric for all
-types of graph" should be added to the experimental section.
+Reviewer 4 raised the point of showing expected performance given some number of cascades, which could be added to the existing simulations.  Figure 1(f) does have a typo, it should read "n" the number of cascades.  Finally, we agree that showing "at least one common metric for all types of graph" should be added to the experimental section.
 
-Misc.
-The requested citations can be included on lines 42, 68, 75, 78, 93, 362. The
-authors regret not to have cited Du et al. 2012 and their work should be
-included in the related work section. It can be mentioned that Daneshmand et al
-adopt the same model as Gomez-R et al '10 and Abrahao et al. '13.  The phrasing
-can be changed from "Graph Inference" to "Network Inference" with the requested
-citations.
+Misc. The requested citations can be included on lines 42, 68, 75, 78, 93, 362. The authors regret not to have cited Du et al. 2012 and their work should be included in the related work section. It can be mentioned that Daneshmand et al adopt the same model as Gomez-R et al '10 and Abrahao et al. '13.  The phrasing can be changed from "Graph Inference" to "Network Inference" with the requested citations.