From 51d36b52877aa07d65bdde02d931ec346ec6ec82 Mon Sep 17 00:00:00 2001
From: Thibaut Horel <thibaut.horel@gmail.com>
Date: Mon, 7 Sep 2015 21:49:08 +0200
Subject: Add TEAC review

---
 teac-2015-00033.txt | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 159 insertions(+)
 create mode 100644 teac-2015-00033.txt

diff --git a/teac-2015-00033.txt b/teac-2015-00033.txt
new file mode 100644
index 0000000..19ad056
--- /dev/null
+++ b/teac-2015-00033.txt
@@ -0,0 +1,159 @@
+In this paper, the authors consider a variant of the Influence Maximization
+problem in which:
+
+* the network's nodes can be in one of three states: "inactive", "aware" and
+  "adopt" (as opposed to two states in many previous information spread
+  models). Aware nodes can propagate the infection even if they do not end up
+  adopting the product.  The transitions from inactive to aware, and from aware
+  to adopt are modeled similarly to the Linear Threshold Model.
+
+* the network's nodes have attributes from which their propensity to adopt the
+  product being marketed can be computed (via a linear relationship). The
+  transition thresholds from the aware to the adopt state depend on this
+  propensity.
+
+As in the original Influence Maximization Problem, the goal is to select a seed
+set (of size at most k) such as to maximize the total number of nodes in the
+"adopt" state when the epidemic stop propagating. The authors propose
+a heuristic based on an "influence potential" score and discounting to account
+for possible overlaps between the neighborhoods of the selected nodes. The
+heuristic is evaluated experimentally.
+
+Model
+=====
+
+The distinction between aware and adopt is interesting: it captures the "five
+stages of adaption" [Beal and Bohlen 1957] more realistically than simpler
+models and could also be applied to epidemiology where an individual might
+carry a disease without being infected. However, the following inaccuracies and
+mis-specifications undermine the strength of the model:
+
+1. The transition from inactive to aware is controlled by a threshold drawn
+   uniformly at random in the interval [0, 1]. As a consequence, given a seed
+   set A, the number of "adopted" nodes when the cascade stops propagating is
+   a random variable. It is not clear whether or not \sigma(A) in the objective
+   function (equation (1)) is this number (the random variable) or its
+   expectation.  Even though it is customary to consider the expectation (see
+   for example [Kempe et al. 2003]), other parts of the paper seem to consider
+   the random variable instead. See comment 8. below.
+
+2. Dividing equation (4) both sides by d_v, we see that awareness is modeled
+   exactly as in the Linear Threshold Model with uniform probabilities.  There
+   is no justification to why the edge weights (w(x, v)) are not used at this
+   step, as is done in equation (5).
+
+3. The distinction between target nodes and non-target nodes seem unnecessary:
+   nodes with a cumulative profit equal to 0 will never transition to the
+   "adopt" state as can be seen in equation (5) and thus will never increase
+   the influence spread score. Thus, the same model describes the transitions
+   of both target and non-target nodes. Furthermore, the term "target node" is
+   misguiding since it leads the reader to believe that the campaign will
+   actively target them, whereas they passively follow the propagation dynamics
+   governed by equation (4) and (5).
+
+4. It should be clearly specified that the model is discrete time: at each time
+   step nodes transition from one state to the other simultaneously according
+   to equation (4) and (5).
+
+5. It is not clear why usefulness is a number in the interval [0, 1].
+   Normalization at this step seems unnecessary since the cumulative profits are
+   then normalized to the range [0, 1] (by dividing by the max value).
+
+6. 3.2.2 (ii) heterogeneous influence probabilities: this paragraph seems out
+   of scope: it is important for the model to have a weighted graph. The fact
+   that (depending on the applications) the weights could be computed from
+   additional information does not appear to be relevant for the model.
+   Furthermore, the experiments at the end do not make use of this.
+
+7. Similarly, the relevance of section 3.2.1 (labels and usefulness) is
+   questionable. The only thing that the model truly requires is threshold to
+   put on the right-hand side of equation (5): the "propensity" to adopt the
+   product. The fact that this propensity could be computed from information
+   available about the nodes is application-dependent and would be better
+   located in works focusing on learning the propensity score from the nodes'
+   attributes.
+
+Overall, the model would benefit from being specified more rigorously and
+reduced to its "core", leaving the questions of modeling and learning its
+parameters from node attributes to future works.
+
+Algorithm
+=========
+
+The suggested heuristic is intuitive and seems well-suited to the proposed
+TAM model, as is confirmed by the experiments. However:
+
+8. Algorithm 2, line 25: "v becomes newly adopted" is somewhat ill-defined: as
+   explained in comment 1. above, the set of "adopt" nodes when the cascade
+   stops propagating is random. This line seems to indicate that this is not
+   the case, and that the thresholds for awareness transitions are fixed once
+   and for all. This is inconsistent with the experiments, where it seems that
+   the authors revert back to considering the expectation (computed by sampling
+   and averaging).
+
+9. Complexity analysis: the running time of computing the test on line 25 in
+   Algorithm 2 does not appear in the time complexity. A naive algorithm would
+   take linear time every time line 25 is executed; could it be done more
+   efficiently?
+
+10. Complexity analysis: using a max-heap Fibonacci heap (which seems required
+    since we need to repeatedly call ExtractMax), DecreaseKey takes O(log N)
+    times (*IncreaseKey* would take O(1) time) and ExtractMax takes O(log N)
+    time, yielding k(log N + d^2 log N) for the second term.
+
+11. Complexity analysis: the simplification when d << N only seems valid when
+    d^2 = o(log N). Stating this explicitly would be more accurate than d << N.
+
+Experiments
+===========
+
+It is interesting to see that the suggested heuristic works better than other
+generic methods, which justifies the effort of designing a method tailored to
+the TAM model. However, I have the following concerns about the methodology
+used for the experiments:
+
+12. "optimal value" of alpha. It is not said which criterion is used to define
+    optimality. At the bottom of page 14, it is mentioned that some values of
+    alpha overestimate or underestimate the influence spread. This suggests the
+    existence of a ground truth to calibrate the alpha parameter, but no
+    mention of this ground truth can be found.
+
+13. as mentioned in comments 1. and 8. above, computing the influence spreads
+    by averaging over 1000 iterations would be consistent with using the
+    expected number of "adopt" nodes in the objective function.  However, this
+    appears to be inconsistent with Algorithm 2.
+
+14. It would have been interesting to analyze how quickly the average of
+    multiple iterations converges to a stable value. I would expect it to
+    depend on the size of the network. However, a constant value of 1000 is
+    used throughout the enter experimental section.
+
+15. Table 2: comparison of Linear Treshold and TAM models is confusing. Why
+    compare the same heuristic (designed for TAM) on two different models? Is
+    obtaining larger numbers better? If this is the case, choosing a model
+    where nodes become aware and adopt with probability 1 would produce even
+    larger numbers.
+
+Style
+=====
+
+The paper is well-structured, making the high-level progression easy to follow.
+However, the paper suffers locally from:
+
+* many typos, which could be fixed by a serious proofreading pass.
+
+* a few sentences whose construction render the intended meaning hard to grasp.
+The most extreme examples are listed below:
+
+    - page 2, line 29 to 31
+    - page 5, line 49
+    - page 17, line 41 to 43
+
+* the use of parenthesis in equation (4) and (5) to denote a multiplication,
+which is unconventional and somewhat confusing.
+
+The paper appears seems to have been put together very hastily. Even if all the
+comments I have made above were to be addressed, I wouldn't be convinced that
+the contributions of the paper (a variant of already existing models and
+heuristic evaluated experimentally) are significant enough to justify an
+acceptance in TEAC.
-- 
cgit v1.2.3-70-g09d2