summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2015-09-07 21:49:08 +0200
committerThibaut Horel <thibaut.horel@gmail.com>2015-09-07 21:49:08 +0200
commit51d36b52877aa07d65bdde02d931ec346ec6ec82 (patch)
treee17cb9bdd6b83ab4ebd39c19c0e4496f20c8a26b
parent3fbb69842a86af665d95bc3778c9b35f957d99ee (diff)
downloadreviews-51d36b52877aa07d65bdde02d931ec346ec6ec82.tar.gz
Add TEAC review
-rw-r--r--teac-2015-00033.txt159
1 files changed, 159 insertions, 0 deletions
diff --git a/teac-2015-00033.txt b/teac-2015-00033.txt
new file mode 100644
index 0000000..19ad056
--- /dev/null
+++ b/teac-2015-00033.txt
@@ -0,0 +1,159 @@
+In this paper, the authors consider a variant of the Influence Maximization
+problem in which:
+
+* the network's nodes can be in one of three states: "inactive", "aware" and
+ "adopt" (as opposed to two states in many previous information spread
+ models). Aware nodes can propagate the infection even if they do not end up
+ adopting the product. The transitions from inactive to aware, and from aware
+ to adopt are modeled similarly to the Linear Threshold Model.
+
+* the network's nodes have attributes from which their propensity to adopt the
+ product being marketed can be computed (via a linear relationship). The
+ transition thresholds from the aware to the adopt state depend on this
+ propensity.
+
+As in the original Influence Maximization Problem, the goal is to select a seed
+set (of size at most k) such as to maximize the total number of nodes in the
+"adopt" state when the epidemic stop propagating. The authors propose
+a heuristic based on an "influence potential" score and discounting to account
+for possible overlaps between the neighborhoods of the selected nodes. The
+heuristic is evaluated experimentally.
+
+Model
+=====
+
+The distinction between aware and adopt is interesting: it captures the "five
+stages of adaption" [Beal and Bohlen 1957] more realistically than simpler
+models and could also be applied to epidemiology where an individual might
+carry a disease without being infected. However, the following inaccuracies and
+mis-specifications undermine the strength of the model:
+
+1. The transition from inactive to aware is controlled by a threshold drawn
+ uniformly at random in the interval [0, 1]. As a consequence, given a seed
+ set A, the number of "adopted" nodes when the cascade stops propagating is
+ a random variable. It is not clear whether or not \sigma(A) in the objective
+ function (equation (1)) is this number (the random variable) or its
+ expectation. Even though it is customary to consider the expectation (see
+ for example [Kempe et al. 2003]), other parts of the paper seem to consider
+ the random variable instead. See comment 8. below.
+
+2. Dividing equation (4) both sides by d_v, we see that awareness is modeled
+ exactly as in the Linear Threshold Model with uniform probabilities. There
+ is no justification to why the edge weights (w(x, v)) are not used at this
+ step, as is done in equation (5).
+
+3. The distinction between target nodes and non-target nodes seem unnecessary:
+ nodes with a cumulative profit equal to 0 will never transition to the
+ "adopt" state as can be seen in equation (5) and thus will never increase
+ the influence spread score. Thus, the same model describes the transitions
+ of both target and non-target nodes. Furthermore, the term "target node" is
+ misguiding since it leads the reader to believe that the campaign will
+ actively target them, whereas they passively follow the propagation dynamics
+ governed by equation (4) and (5).
+
+4. It should be clearly specified that the model is discrete time: at each time
+ step nodes transition from one state to the other simultaneously according
+ to equation (4) and (5).
+
+5. It is not clear why usefulness is a number in the interval [0, 1].
+ Normalization at this step seems unnecessary since the cumulative profits are
+ then normalized to the range [0, 1] (by dividing by the max value).
+
+6. 3.2.2 (ii) heterogeneous influence probabilities: this paragraph seems out
+ of scope: it is important for the model to have a weighted graph. The fact
+ that (depending on the applications) the weights could be computed from
+ additional information does not appear to be relevant for the model.
+ Furthermore, the experiments at the end do not make use of this.
+
+7. Similarly, the relevance of section 3.2.1 (labels and usefulness) is
+ questionable. The only thing that the model truly requires is threshold to
+ put on the right-hand side of equation (5): the "propensity" to adopt the
+ product. The fact that this propensity could be computed from information
+ available about the nodes is application-dependent and would be better
+ located in works focusing on learning the propensity score from the nodes'
+ attributes.
+
+Overall, the model would benefit from being specified more rigorously and
+reduced to its "core", leaving the questions of modeling and learning its
+parameters from node attributes to future works.
+
+Algorithm
+=========
+
+The suggested heuristic is intuitive and seems well-suited to the proposed
+TAM model, as is confirmed by the experiments. However:
+
+8. Algorithm 2, line 25: "v becomes newly adopted" is somewhat ill-defined: as
+ explained in comment 1. above, the set of "adopt" nodes when the cascade
+ stops propagating is random. This line seems to indicate that this is not
+ the case, and that the thresholds for awareness transitions are fixed once
+ and for all. This is inconsistent with the experiments, where it seems that
+ the authors revert back to considering the expectation (computed by sampling
+ and averaging).
+
+9. Complexity analysis: the running time of computing the test on line 25 in
+ Algorithm 2 does not appear in the time complexity. A naive algorithm would
+ take linear time every time line 25 is executed; could it be done more
+ efficiently?
+
+10. Complexity analysis: using a max-heap Fibonacci heap (which seems required
+ since we need to repeatedly call ExtractMax), DecreaseKey takes O(log N)
+ times (*IncreaseKey* would take O(1) time) and ExtractMax takes O(log N)
+ time, yielding k(log N + d^2 log N) for the second term.
+
+11. Complexity analysis: the simplification when d << N only seems valid when
+ d^2 = o(log N). Stating this explicitly would be more accurate than d << N.
+
+Experiments
+===========
+
+It is interesting to see that the suggested heuristic works better than other
+generic methods, which justifies the effort of designing a method tailored to
+the TAM model. However, I have the following concerns about the methodology
+used for the experiments:
+
+12. "optimal value" of alpha. It is not said which criterion is used to define
+ optimality. At the bottom of page 14, it is mentioned that some values of
+ alpha overestimate or underestimate the influence spread. This suggests the
+ existence of a ground truth to calibrate the alpha parameter, but no
+ mention of this ground truth can be found.
+
+13. as mentioned in comments 1. and 8. above, computing the influence spreads
+ by averaging over 1000 iterations would be consistent with using the
+ expected number of "adopt" nodes in the objective function. However, this
+ appears to be inconsistent with Algorithm 2.
+
+14. It would have been interesting to analyze how quickly the average of
+ multiple iterations converges to a stable value. I would expect it to
+ depend on the size of the network. However, a constant value of 1000 is
+ used throughout the enter experimental section.
+
+15. Table 2: comparison of Linear Treshold and TAM models is confusing. Why
+ compare the same heuristic (designed for TAM) on two different models? Is
+ obtaining larger numbers better? If this is the case, choosing a model
+ where nodes become aware and adopt with probability 1 would produce even
+ larger numbers.
+
+Style
+=====
+
+The paper is well-structured, making the high-level progression easy to follow.
+However, the paper suffers locally from:
+
+* many typos, which could be fixed by a serious proofreading pass.
+
+* a few sentences whose construction render the intended meaning hard to grasp.
+The most extreme examples are listed below:
+
+ - page 2, line 29 to 31
+ - page 5, line 49
+ - page 17, line 41 to 43
+
+* the use of parenthesis in equation (4) and (5) to denote a multiplication,
+which is unconventional and somewhat confusing.
+
+The paper appears seems to have been put together very hastily. Even if all the
+comments I have made above were to be addressed, I wouldn't be convinced that
+the contributions of the paper (a variant of already existing models and
+heuristic evaluated experimentally) are significant enough to justify an
+acceptance in TEAC.