From 51d36b52877aa07d65bdde02d931ec346ec6ec82 Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Mon, 7 Sep 2015 21:49:08 +0200 Subject: Add TEAC review --- teac-2015-00033.txt | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 teac-2015-00033.txt diff --git a/teac-2015-00033.txt b/teac-2015-00033.txt new file mode 100644 index 0000000..19ad056 --- /dev/null +++ b/teac-2015-00033.txt @@ -0,0 +1,159 @@ +In this paper, the authors consider a variant of the Influence Maximization +problem in which: + +* the network's nodes can be in one of three states: "inactive", "aware" and + "adopt" (as opposed to two states in many previous information spread + models). Aware nodes can propagate the infection even if they do not end up + adopting the product. The transitions from inactive to aware, and from aware + to adopt are modeled similarly to the Linear Threshold Model. + +* the network's nodes have attributes from which their propensity to adopt the + product being marketed can be computed (via a linear relationship). The + transition thresholds from the aware to the adopt state depend on this + propensity. + +As in the original Influence Maximization Problem, the goal is to select a seed +set (of size at most k) such as to maximize the total number of nodes in the +"adopt" state when the epidemic stop propagating. The authors propose +a heuristic based on an "influence potential" score and discounting to account +for possible overlaps between the neighborhoods of the selected nodes. The +heuristic is evaluated experimentally. + +Model +===== + +The distinction between aware and adopt is interesting: it captures the "five +stages of adaption" [Beal and Bohlen 1957] more realistically than simpler +models and could also be applied to epidemiology where an individual might +carry a disease without being infected. However, the following inaccuracies and +mis-specifications undermine the strength of the model: + +1. The transition from inactive to aware is controlled by a threshold drawn + uniformly at random in the interval [0, 1]. As a consequence, given a seed + set A, the number of "adopted" nodes when the cascade stops propagating is + a random variable. It is not clear whether or not \sigma(A) in the objective + function (equation (1)) is this number (the random variable) or its + expectation. Even though it is customary to consider the expectation (see + for example [Kempe et al. 2003]), other parts of the paper seem to consider + the random variable instead. See comment 8. below. + +2. Dividing equation (4) both sides by d_v, we see that awareness is modeled + exactly as in the Linear Threshold Model with uniform probabilities. There + is no justification to why the edge weights (w(x, v)) are not used at this + step, as is done in equation (5). + +3. The distinction between target nodes and non-target nodes seem unnecessary: + nodes with a cumulative profit equal to 0 will never transition to the + "adopt" state as can be seen in equation (5) and thus will never increase + the influence spread score. Thus, the same model describes the transitions + of both target and non-target nodes. Furthermore, the term "target node" is + misguiding since it leads the reader to believe that the campaign will + actively target them, whereas they passively follow the propagation dynamics + governed by equation (4) and (5). + +4. It should be clearly specified that the model is discrete time: at each time + step nodes transition from one state to the other simultaneously according + to equation (4) and (5). + +5. It is not clear why usefulness is a number in the interval [0, 1]. + Normalization at this step seems unnecessary since the cumulative profits are + then normalized to the range [0, 1] (by dividing by the max value). + +6. 3.2.2 (ii) heterogeneous influence probabilities: this paragraph seems out + of scope: it is important for the model to have a weighted graph. The fact + that (depending on the applications) the weights could be computed from + additional information does not appear to be relevant for the model. + Furthermore, the experiments at the end do not make use of this. + +7. Similarly, the relevance of section 3.2.1 (labels and usefulness) is + questionable. The only thing that the model truly requires is threshold to + put on the right-hand side of equation (5): the "propensity" to adopt the + product. The fact that this propensity could be computed from information + available about the nodes is application-dependent and would be better + located in works focusing on learning the propensity score from the nodes' + attributes. + +Overall, the model would benefit from being specified more rigorously and +reduced to its "core", leaving the questions of modeling and learning its +parameters from node attributes to future works. + +Algorithm +========= + +The suggested heuristic is intuitive and seems well-suited to the proposed +TAM model, as is confirmed by the experiments. However: + +8. Algorithm 2, line 25: "v becomes newly adopted" is somewhat ill-defined: as + explained in comment 1. above, the set of "adopt" nodes when the cascade + stops propagating is random. This line seems to indicate that this is not + the case, and that the thresholds for awareness transitions are fixed once + and for all. This is inconsistent with the experiments, where it seems that + the authors revert back to considering the expectation (computed by sampling + and averaging). + +9. Complexity analysis: the running time of computing the test on line 25 in + Algorithm 2 does not appear in the time complexity. A naive algorithm would + take linear time every time line 25 is executed; could it be done more + efficiently? + +10. Complexity analysis: using a max-heap Fibonacci heap (which seems required + since we need to repeatedly call ExtractMax), DecreaseKey takes O(log N) + times (*IncreaseKey* would take O(1) time) and ExtractMax takes O(log N) + time, yielding k(log N + d^2 log N) for the second term. + +11. Complexity analysis: the simplification when d << N only seems valid when + d^2 = o(log N). Stating this explicitly would be more accurate than d << N. + +Experiments +=========== + +It is interesting to see that the suggested heuristic works better than other +generic methods, which justifies the effort of designing a method tailored to +the TAM model. However, I have the following concerns about the methodology +used for the experiments: + +12. "optimal value" of alpha. It is not said which criterion is used to define + optimality. At the bottom of page 14, it is mentioned that some values of + alpha overestimate or underestimate the influence spread. This suggests the + existence of a ground truth to calibrate the alpha parameter, but no + mention of this ground truth can be found. + +13. as mentioned in comments 1. and 8. above, computing the influence spreads + by averaging over 1000 iterations would be consistent with using the + expected number of "adopt" nodes in the objective function. However, this + appears to be inconsistent with Algorithm 2. + +14. It would have been interesting to analyze how quickly the average of + multiple iterations converges to a stable value. I would expect it to + depend on the size of the network. However, a constant value of 1000 is + used throughout the enter experimental section. + +15. Table 2: comparison of Linear Treshold and TAM models is confusing. Why + compare the same heuristic (designed for TAM) on two different models? Is + obtaining larger numbers better? If this is the case, choosing a model + where nodes become aware and adopt with probability 1 would produce even + larger numbers. + +Style +===== + +The paper is well-structured, making the high-level progression easy to follow. +However, the paper suffers locally from: + +* many typos, which could be fixed by a serious proofreading pass. + +* a few sentences whose construction render the intended meaning hard to grasp. +The most extreme examples are listed below: + + - page 2, line 29 to 31 + - page 5, line 49 + - page 17, line 41 to 43 + +* the use of parenthesis in equation (4) and (5) to denote a multiplication, +which is unconventional and somewhat confusing. + +The paper appears seems to have been put together very hastily. Even if all the +comments I have made above were to be addressed, I wouldn't be convinced that +the contributions of the paper (a variant of already existing models and +heuristic evaluated experimentally) are significant enough to justify an +acceptance in TEAC. -- cgit v1.2.3-70-g09d2