1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
|
In this paper, the authors consider a variant of the Influence Maximization
problem in which:
* the network's nodes can be in one of three states: "inactive", "aware" and
"adopt" (as opposed to two states in many previous information spread
models). Aware nodes can propagate the infection even if they do not end up
adopting the product. The transitions from inactive to aware, and from aware
to adopt are modeled similarly to the Linear Threshold Model.
* the network's nodes have attributes from which their propensity to adopt the
product being marketed can be computed (via a linear relationship). The
transition thresholds from the aware to the adopt state depend on this
propensity.
As in the original Influence Maximization Problem, the goal is to select a seed
set (of size at most k) such as to maximize the total number of nodes in the
"adopt" state when the epidemic stop propagating. The authors propose
a heuristic based on an "influence potential" score and discounting to account
for possible overlaps between the neighborhoods of the selected nodes. The
heuristic is evaluated experimentally.
Model
=====
The distinction between aware and adopt is interesting: it captures the "five
stages of adaption" [Beal and Bohlen 1957] more realistically than simpler
models and could also be applied to epidemiology where an individual might
carry a disease without being infected. However, the following inaccuracies and
mis-specifications undermine the strength of the model:
1. The transition from inactive to aware is controlled by a threshold drawn
uniformly at random in the interval [0, 1]. As a consequence, given a seed
set A, the number of "adopted" nodes when the cascade stops propagating is
a random variable. It is not clear whether or not \sigma(A) in the objective
function (equation (1)) is this number (the random variable) or its
expectation. Even though it is customary to consider the expectation (see
for example [Kempe et al. 2003]), other parts of the paper seem to consider
the random variable instead. See comment 8. below.
2. Dividing equation (4) both sides by d_v, we see that awareness is modeled
exactly as in the Linear Threshold Model with uniform probabilities. There
is no justification to why the edge weights (w(x, v)) are not used at this
step, as is done in equation (5).
3. The distinction between target nodes and non-target nodes seem unnecessary:
nodes with a cumulative profit equal to 0 will never transition to the
"adopt" state as can be seen in equation (5) and thus will never increase
the influence spread score. Thus, the same model describes the transitions
of both target and non-target nodes. Furthermore, the term "target node" is
misguiding since it leads the reader to believe that the campaign will
actively target them, whereas they passively follow the propagation dynamics
governed by equation (4) and (5).
4. It should be clearly specified that the model is discrete time: at each time
step nodes transition from one state to the other simultaneously according
to equation (4) and (5).
5. It is not clear why usefulness is a number in the interval [0, 1].
Normalization at this step seems unnecessary since the cumulative profits are
then normalized to the range [0, 1] (by dividing by the max value).
6. 3.2.2 (ii) heterogeneous influence probabilities: this paragraph seems out
of scope: it is important for the model to have a weighted graph. The fact
that (depending on the applications) the weights could be computed from
additional information does not appear to be relevant for the model.
Furthermore, the experiments at the end do not make use of this.
7. Similarly, the relevance of section 3.2.1 (labels and usefulness) is
questionable. The only thing that the model truly requires is threshold to
put on the right-hand side of equation (5): the "propensity" to adopt the
product. The fact that this propensity could be computed from information
available about the nodes is application-dependent and would be better
located in works focusing on learning the propensity score from the nodes'
attributes.
Overall, the model would benefit from being specified more rigorously and
reduced to its "core", leaving the questions of modeling and learning its
parameters from node attributes to future works.
Algorithm
=========
The suggested heuristic is intuitive and seems well-suited to the proposed
TAM model, as is confirmed by the experiments. However:
8. Algorithm 2, line 25: "v becomes newly adopted" is somewhat ill-defined: as
explained in comment 1. above, the set of "adopt" nodes when the cascade
stops propagating is random. This line seems to indicate that this is not
the case, and that the thresholds for awareness transitions are fixed once
and for all. This is inconsistent with the experiments, where it seems that
the authors revert back to considering the expectation (computed by sampling
and averaging).
9. Complexity analysis: the running time of computing the test on line 25 in
Algorithm 2 does not appear in the time complexity. A naive algorithm would
take linear time every time line 25 is executed; could it be done more
efficiently?
10. Complexity analysis: using a max-heap Fibonacci heap (which seems required
since we need to repeatedly call ExtractMax), DecreaseKey takes O(log N)
times (*IncreaseKey* would take O(1) time) and ExtractMax takes O(log N)
time, yielding k(log N + d^2 log N) for the second term.
11. Complexity analysis: the simplification when d << N only seems valid when
d^2 = o(log N). Stating this explicitly would be more accurate than d << N.
Experiments
===========
It is interesting to see that the suggested heuristic works better than other
generic methods, which justifies the effort of designing a method tailored to
the TAM model. However, I have the following concerns about the methodology
used for the experiments:
12. "optimal value" of alpha. It is not said which criterion is used to define
optimality. At the bottom of page 14, it is mentioned that some values of
alpha overestimate or underestimate the influence spread. This suggests the
existence of a ground truth to calibrate the alpha parameter, but no
mention of this ground truth can be found.
13. as mentioned in comments 1. and 8. above, computing the influence spreads
by averaging over 1000 iterations would be consistent with using the
expected number of "adopt" nodes in the objective function. However, this
appears to be inconsistent with Algorithm 2.
14. It would have been interesting to analyze how quickly the average of
multiple iterations converges to a stable value. I would expect it to
depend on the size of the network. However, a constant value of 1000 is
used throughout the enter experimental section.
15. Table 2: comparison of Linear Treshold and TAM models is confusing. Why
compare the same heuristic (designed for TAM) on two different models? Is
obtaining larger numbers better? If this is the case, choosing a model
where nodes become aware and adopt with probability 1 would produce even
larger numbers.
Style
=====
The paper is well-structured, making the high-level progression easy to follow.
However, the paper suffers locally from:
* many typos, which could be fixed by a serious proofreading pass.
* a few sentences whose construction render the intended meaning hard to grasp.
The most extreme examples are listed below:
- page 2, line 29 to 31
- page 5, line 49
- page 17, line 41 to 43
* the use of parenthesis in equation (4) and (5) to denote a multiplication,
which is unconventional and somewhat confusing.
The paper appears seems to have been put together very hastily. Even if all the
comments I have made above were to be addressed, I wouldn't be convinced that
the contributions of the paper (a variant of already existing models and
heuristic evaluated experimentally) are significant enough to justify an
acceptance in TEAC.
|