paper/rebuttal.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

We would like to thank the reviewers for carefully reading our paper and their
insightful remarks. We have tried to address the main points of contention
below:

R2:
"
The set of parameters \theta always lies in some constrained space. For
example, in the independent cascade model, θ_{i,j} < 0; in the voter model,
∑_{i,j} θ_{i,i} = 1 and θ_{i,i}≠0.[...] authors would have (norm_1 +
regularization induced by constraints), which is not really clear whether is
decomposable or not.
"

This is a great point and we should have been more explicit about this. Overall
our results still hold. We need to distinguish between two types of
constraints:

* the constraints of the type θ_{i,j} < 0, θ_{i,j} ≠ 0. These constraints are
  already implicitly present in our optimization program: indeed, the
  log-likelihood function is undefined (or equivalently can be extended to take
  the value -∞) when these constraints are violated.

* the constraint ∑_j θ_j = 1 for the voter model:

    - We first note that we don't have to enforce this constraint in the
      optimization program (2): if we solve it without the constraint, the
      guarantee on the l2 norm (Theorem 2) still applies. The only downside is
      that the learned parameters might not sum up to one, which is something
      we might need for applications (e.g. simulations).  This is
      application-dependent and somewhat out of the scope of our paper, but it
      is easy to prove that if we normalize the learned parameters to sum up to
      one after solving (2), the l2 guarantee of Theorem 2 loses
      a multiplicative factor at most √s.

    - If we know from the beginning that we will need the learned parameters to
      sum up to one, the constraint can be added to the optimization program.
      By Lagrangian duality, there exists an augmented objective function (with
      an additional linear term corresponding to the constraint) such that the
      maximum of both optimization problems is the same and the solution of the
      augmented program satisfies the constraint. Theorem 2 applies verbatim to
      the augmented program and we obtain the same l2 guarantee.

"
In the independent cascade model, nodes have one chance to infect their
neighbors. However, the definition in section 2.2.1. seems to allow for multiple
attempts
"
As the reviewer correctly points out, the standard ICC model does not allow
for multiple infection attempts over time. The definition of section 2.2.1 also
prohibits multiple attempts by considering that nodes stay active for only one
time step and saying that only nodes which have not been infected before are
susceptible to be infected.

R3:
"
multiple sources don't make much of a difference in their model, because [...]
it's the same as two consecutive, independent cascades.
"
This is an interesting point. However, in the problem we study the graph is
unknown to us. Suppose that two cascades start at the same time at two very
different points in the graph. Despite the fact that the infected nodes from
each cascade will not overlap, we cannot in practice attribute an infected node
to either cascade because we don't know which source is closer to it.

"
The inference in discrete time, one-time-susceptible contagion
processes is less interesting and easier than the continuous version.
"
This is an interesting point. We note that the generalized cascade model class
is sufficiently flexible to include multiple-time-susceptible contagion
processes (such as the linear voter model). Furthermore, it is not immediately
clear that discrete-time processes cannot approximate some continuous time
processes efficiently. For example, we can discretize the continuous time
process of Gomez-Rodriguez et al. 2011 with exponential transmission likelihood
by binning infections to regular intervals of size dt. By the exponential
distribution's memorylessness, it can be shown that when dt<<1, the problem is
still decomposable and fits into the Generalized Linear Cascade model
framework.

We agree with the remark that the definition of cascade models should specify
that this holds only for discrete-time cascades, and that running time
comparison of different algorithms should be added.

R4:
Reviewer 4 raised the point of showing expected performance given some
number of cascades, which could be added to the existing simulations.
Figure 1(f) does have a typo, it should read "n" the number of cascades.
Finally, we agree that showing "at least one common metric for all
types of graph" should be added to the experimental section.

Misc.
The requested citations can be included on lines 42, 68, 75, 78, 93, 362. The
authors regret not to have cited Du et al. 2012 and their work should be
included in the related work section. It can be mentioned that Daneshmand et al
adopt the same model as Gomez-R et al '10 and Abrahao et al. '13.  The phrasing
can be changed from "Graph Inference" to "Network Inference" with the requested
citations.