diff options
| author | Thibaut Horel <thibaut.horel@gmail.com> | 2015-05-16 17:32:37 +0200 |
|---|---|---|
| committer | Thibaut Horel <thibaut.horel@gmail.com> | 2015-05-16 17:32:37 +0200 |
| commit | 124e5e7e441956dad1ceb9dc8987808b8f522d62 (patch) | |
| tree | 5201560cdd1d5d73897705052cfb9bc35c4c1a4e | |
| download | reviews-124e5e7e441956dad1ceb9dc8987808b8f522d62.tar.gz | |
First set of reviews
| -rw-r--r-- | kdd-2015-706.txt | 69 | ||||
| -rw-r--r-- | kdd-2015-77.txt | 60 | ||||
| -rw-r--r-- | stoc-2015.txt | 151 |
3 files changed, 280 insertions, 0 deletions
diff --git a/kdd-2015-706.txt b/kdd-2015-706.txt new file mode 100644 index 0000000..4d115b1 --- /dev/null +++ b/kdd-2015-706.txt @@ -0,0 +1,69 @@ +1.What is your confidence in rating this paper? + +Medium - knowledgeable on this topic + +2.How would you rate the novelty of the problem solved in this paper? + +A minor variation of some well studied problems + +3.How would you rate the technical ideas and development in this paper? + +The technical development is incremental without fundamental contributions + +4.How would you rate the empirical study conducted in this paper? + +Acceptable, but there is room for improvement + +5.Repeatability: are the data sets used publicly available (and thus the experiments may be repeated by a third party)?( Required, Visible To Authors Only After Decision Notification ) + +Yes + +6.How would you rate the quality of presentation? + +Well-written but has a significant number of typos and/or grammatical errors. Would need significant editing before publication. + +7.What is your overall recommendation? + +Weak reject. I vote for rejecting it, although would not be upset if it were accepted. + +8.Should the Best Paper Committee consider this paper for an award? + +No + +9.List up to 3 particular strengths of the paper. If none, just say "none". + +* the distinction between edge level feedback and node level feedback is an important one and it is good that it was pointed out by the authors. +* the application of Zinkevitch's result on online optimization showing that learning the influence probabilities can be done in an online manner. +* the experiment showing that strategic exploration leads to a faster learning of the influence probabilities. + +10.List up to 3 particular weaknesses of this paper. If none, just say "none". + +* the bound in Theorem 2 is stated in terms of the likelihood. In contrast to Theorem 1, it is not clear how one would combine this result with existing guarantees on the regret to obtain guarantees in the specific case where the maximum likelihood approach is used. +* no formal guarantee is given for the strategic exploration heuristic +* no guarantees on the regret are given and it is not obvious how to adapt the results from Chen et al. to obtain them under the node level feedback assumption + +11.Detailed comments for the authors; justification of your overall rating. + +This paper analyzes the problem of influence maximization in the Independent Cascade model when the influence probabilities are unknown. The authors frame the problem using the CMAB paradigm where at each time step the experimenter chooses a set of nodes to seed, observes the spread of influence for this seed choice and update her knowledge about the unknown parameters. Two goals are studied: minimizing the regret (difference to the approximately optimal influence reachable if the parameters were known) and simply learning the unknown parameters as quickly as possible. + +The application of the CMAB paradigm to influence maximization was already noted in [8, 9]. The key difference from [8, 9] is that the authors make the more realistic assumption that the experimenter does not observe which edges were active or dead during a cascade. Only the time at which nodes became active is known, which is both more realistic and more akin to the standard assumptions in the graph inference literature [10, 16, 18, 27]. + +In order to apply the CMAB paradigm under this assumption, the authors suggest two ways to update the estimates of the edges weights using the observed times at which nodes became active. In the first one, an edge is chosen at random among the ingoing edges of a node which just turned active and is considered the "correct" active edge. The authors give a bound on how the estimates derived from this method differ from the estimates that would have been obtained had the experimenter observed the active edges directly. In the second approach, the authors use the maximum likelihood approach of [27]to learn the weights from the observations. The apply the famous online optimization result from Zinkevitch [33] to give a bound on the learning rate when the updates to the estimates are done in an online manner. + +The "node level feedback" assumption seems to be the right one, but I have some concerns about the results in Section 3 in relation to Section 5. Since (close to optimal) bounds on the regrets are known under the "edge level feedback" assumption, one goal of this paper should be to obtain clear bounds on the regret under the "node level feedback" assumption. One way to obtain those bounds would be to show how the estimates obtained from node level feedback differ from the estimates obtained from edge level feedback. In this respect, Theorem 1 is satisfying, even though it is not shown how to use it to obtain a bound on the regret. Theorem 2 is a direct application of Zinkevitch's result and while it is interesting in itself, it is really not obvious how to use it to obtain bounds on the regret: the quality of the estimate is compared to the true value and not the estimate obtained from edge level feedback. Furthermore, the objective function is the log likelihood which doesn't seem to play well with the kind of computations needed when computing the regret. Also, it would have been nice to give a closed-form formula for the gradient of L to get a better sense of the computational implications of the MLE approach. Finally, I think the bound G on the gradient of L used in Theorem 2 could be expressed in terms of other parameters (p_min, p_max, etc) and would make the Theorem more interesting than directly applying the result from Zinkevitch's paper. + +The section on network exploration is slightly confusing: in this section, the goal is simply to learn the parameters as efficiently as possible. The way this section is written (randomized vs strategic exploration) is simply a rephrasing of two classical learning problems: online learning when iid samples are given to the observer, and online active learning when the observer can choose which data points to observe. As such, the result on the "random exploration strategy" is simply a rephrasing of a result on the learning rate with idd samples. In some sense, this result is weaker than results in the graph inference literature since in this paper, the graph is assumed to be known, which is usually not the case in network inference. The strategic exploration heuristic is interesting, but unfortunately, no formal guarantees are provided. + +The experiment section seemed good to me overall. My only concern is the choice of the influence probabilities for the ground truth: only the weighted cascade model was tested. It would be interesting to know whether or not the results are robust to other models. Also the fact that on average, nodes which become active had 1.175 active parents at the previous time step undermines the importance of the node level feedback assumption (if nodes only have one active parent, then edge level feedback is equivalent to node level feedback): maybe some experiments with higher probabilities (to force more active parents) would be interesting? + +12.List of typos, grammatical errors and/or concrete suggestions to improve presentation. + +Section 3, Online MLE Learning: "at each round we pick a seed set at random, ..." this is actually not required for the result to hold (as explained below Theorem 2). I think it would be clearer to keep the bit about choosing a random seed set for section 4.1. + +Section 4: + +* as stated above I think it would be clearer to write it in terms of iid samples versus active learning rather than random vs strategic exploration. + +* algorithm 1 uses UPDATE which is only defined on the next page. At this point it is really not clear what M is, which was confusing to me. + + diff --git a/kdd-2015-77.txt b/kdd-2015-77.txt new file mode 100644 index 0000000..945c623 --- /dev/null +++ b/kdd-2015-77.txt @@ -0,0 +1,60 @@ +1.What is your confidence in rating this paper? + +Medium - knowledgeable on this topic + +2.How would you rate the novelty of the problem solved in this paper? + +A minor variation of some well studied problems + +3.How would you rate the technical ideas and development in this paper? + +The technical development is incremental without fundamental contributions + +4.How would you rate the empirical study conducted in this paper? + +Acceptable, but there is room for improvement + +5.Repeatability: are the data sets used publicly available (and thus the experiments may be repeated by a third party)?( Required, Visible To Authors Only After Decision Notification ) + +Yes + +6.How would you rate the quality of presentation? + +A very well-written paper, a pleasure to read. No obvious flaws. + +7.What is your overall recommendation?\ + +Weak reject. I vote for rejecting it, although would not be upset if it were accepted. + +8.Should the Best Paper Committee consider this paper for an award? + +No + +9.List up to 3 particular strengths of the paper. If none, just say "none". + +* nice Bayesian approach to learn the weights by updating a Beta distribution after each observation +* interesting approach to improve the running time of online influence maximization by reusing samples across time steps + +10.List up to 3 particular weaknesses of this paper. If none, just say "none". + +* it is assumed that it is known which edges were active or dead during a cascade. I think this is unrealistic. +* two other approaches could have been considered: Thomson sampling and UCB. I find it surprising that the authors did not compare to those. +* it would have been nice to obtain theoretical bounds on the performance (number of influenced nodes) of the algorithms + +11.Detailed comments for the authors; justification of your overall rating. + +This paper studies an online influence maximization problem when the influence probabilities of the independent cascade model are not known. In this problem, an experimenter repeatedly selects a set of nodes to seed and observed the resulting influence cascade. The goal is to maximize the overall number of nodes influenced (the union over all time steps). As more cascades are observed, the experimenter updates her knowledge over the unknown graph parameters, thus facing a standard exploration/exploitation trade-off. + +To the best of my knowledge, this is the first time that the Online Influence Maximization problem is stated exactly in those terms, but the authors fail to mention the Combinatorial Multi-Armed Bandit Framework of Chen et al. which has been applied to influence maximization before. I think something is missing in the statement of the OIM problem: it is not clear what the attained value is compared to. Looking at the experiments, it seems that it is compared (in absolute distance) to the best value attainable using an approximation algorithm with the true value of the parameters. Hence, this is the same criterion as the regret defined by Chen et al and should be defined explicitly. Other criteria could be considered, such as a multiplicative approximation ratio. + +The authors adopt a Bayesian approach to deal with the uncertainty on the graph parameters: the edge weights are modeled using a Beta distribution which is updated as new observations are made. I think this is an interesting approach, however it relies on a very strong (and, in my opinion, unrealistic for practical purposes) assumption: it is assumed that the observer knows which edges were alive or dead during a cascade. This is much stronger than the assumption frequently made in network inference papers where only the time at which nodes became active is observed. + +I found it surprising that the authors did not mention Thomson sampling at all. This is a very common approach when adopting a Bayesian bandit approach. In particular, it seems to me that Thomson sampling addresses precisely the issue mentioned in Section 5.2.1: "When P_i,j is a highly uncertain Beta distribution, any value in [0,1] can be the real influence probability". This will be the case if a sample from the current distribution is used, instead of simply its mean. Similarly, the UCB algorithm is very similar to the Confidence-Bound algorithm suggested by the authors. I think both Thomson sampling and UCB should have been used as baselines in the experiments. + +I appreciated the exposition of section 6 where it is clearly explained how to update the parameters of the Beta distribution when more observations are made. However, I did not understand why alpha could be set to one when trying to find the optimal value of alpha and beta in the maximum likelihood estimation. I also disagree with the statement in 6.3 that the exponential gradient algorithm can be proved to be competitive against the optimal sequence of the theta values. To the best of my knowledge, it is only competitive against the best choice of a *constant* theta. This concerns me because the optimal sequence of theta will intuitively not be constant: theta should decrease as more observations are made. + +Except for the lack of comparison to UCB and Thomson sampling, I found the experiments interesting. In particular, it is nice to see how reusing samples across rounds significantly improves the running time while preserving most of the attained value. + +12.List of typos, grammatical errors and/or concrete suggestions to improve presentation. + +I would have liked to see a formal description of Explore and Exploit used in the epsilon-greedy algorithm. diff --git a/stoc-2015.txt b/stoc-2015.txt new file mode 100644 index 0000000..771c3a9 --- /dev/null +++ b/stoc-2015.txt @@ -0,0 +1,151 @@ +Stoc2015 Review for "Amortized Analysis on Asynchronous Gradient Descent" + +* COI Statement: None + +Review of paper (visible to author): + +* Summary of the problem and paper contribution: + +This paper proposes a new model to analyze asynchronous gradient descent for +a convex twice-differentiable function. In this model, each core updates +a single coordinate (or a subset of coordinates) of the current solution. + +In most previous models, time was discretized: at each time step, each core +updates its coordinate and sends its update to all the other cores. The next +step can only start when all the cores have completed the current step, +possibly leading to waiting periods where some cores have to wait for the +other cores. + +In the proposed model however, time is continuous: each core updates its +coordinate at its own pace and send the updates to the other cores as they are +computed; there is no waiting time. The problem then is that a core might +compute its update rule on outdated data from the other cores (if it hasn't +received their updates yet). To account for this, an assumption of "bounded +asynchrony" is made: the time to communicate an update is always smaller than +the time to compute an update. As a consequence, when computing an update, +a core has the guarantee that the data it is using is never out of date by more +than one round. + +Assuming some conditions on the step sizes of the update rule in relation with +the Hessian of the objective function, the authors are able to show that in +this model, the asynchronous gradient descent converges to the global optimum +of the objective function. In the general case, the convergence rate is +sublinear. If the function is strongly convex however, the convergence is +linear. This is the first main theorem of the paper. + +Two applications of this result are given: + +- inversing linear systems, or more generally, minimizing the least square + error plus a sum of univariate convex functions. The authors are able to show + that in this case it is able to choose a step size satisfying the assumptions + of the main theorem and obtain convergence of the asynchronous gradient + descent. + +- tatonnement in Fisher Markets. A previous result by Cheung, Cole and Devanur + showed that for buyers with complementary-CES or Leontief utility functions, + tatonnement is equivalent to gradient descent on a convex function. This was + used to show that synchronous tatonnement converges to the market + equilibrium. Similarly, in this paper, the first main theorem can be used to + prove that under the same assumptions, asynchronous tatonnement converges to + the market equilibrium. This is the second main theorem of the paper. + + +* What is best about the paper: + +The proposed model for asynchrony is nice. The "bounded asynchrony" assumption +seems to be the "right" assumption to make and for which it is still possible +to obtain results. In particular, it allows to see this model as a direct +generalization of the synchronous model. + +The proof of convergence requires to address the following difficulty: since +the update rule is computed on possibly outdated data, there is no guarantee +that an update will always result in a decrease in the objective function. The +authors tackled this through amortized analysis: they designed a potential +function which is at least a half of the underlying objective function +and which is always decreasing. + +The application to Fisher Markets is significant: it generalizes previous +results on synchronous tatonnement and makes weaker assumptions on the timing +of the buyers' price updates. In particular it does not assume a fixed +schedule. + + +* What are the paper weaknesses: + +The exposition of the first main theorem lacks clarity. In particular, the +technical details shine a bit too much through the definition of +"controlledness" (Definition 2). It could be interesting to find a simpler set +of assumptions implying the technical ones and which would make applying the +main theorem simpler. For example, the two applications given in the paper seem +to require weaker assumptions than the ones given in Definition 2. + +The amortized analysis of the convergence is quite similar to the one found in +Cheung, Cole and Rastogi: Tatonnement in ongoing markets of complementary +goods, and as such is not a technical novelty (even though it is definitely +technically challenging). + +It is a bit disappointing that many results on Fisher Markets (for Leontief +utility functions and Ongoing markets) can only be found in the appendix and +thus cannot be considered part of the main body of the paper. Could it make +sense to have a separate paper focusing entirely on asynchronous tatonnement +processes and give them a better treatment? Currently, it feels like they are +treated as second-class citizens. + + +* Target audience: + +Two kinds of people could directly benefit from this paper: +- people interested in parallelized convex optimization +- people interested in tatonnement processes in Fisher Markets + +In particular I hope that this paper will spark a discussion on what the right +assumptions for asynchronous agents in Fisher Markets are. More generally, +I hope this paper will encourage researchers to analyze dynamic systems under +weaker assumptions on their synchrony. This paper is a good example that +results can be obtained without assuming a fixed schedule under fairly general +assumptions. + + +* Summary of recommendation: + +Weak accept. Even though I like the asynchronous model and the results +obtained in this model, I think the paper could benefit from a better +exposition: + +- on the technical side, putting some effort on abstracting the assumptions and +simplifying the analysis could make the results easier to apply to other +settings. + +- the results on taonnement deserve better treatment and in particular should +not be hidden in the appendix. + +* Detailed technical comments: + +I only found minor typos and inconsistencies in the notations while reading the +paper: + +- the potential phi sometimes take three arguments (p, t, tau) and sometimes + takes a single argument (t). I think having phi depend only the time t would + be sufficient (since p and tau are fully determined given the time). + +- Lemma 2 page 6 should read: let \tilde{g} ... the maximum and minimum values + of \nabla_j \phi(p') (the \phi is missing) + +- the notation keeps switching between max (1 / xsi) and 1 / (min xsi) + +- the proof of Lemma 2 p 14 expresses the path in terms of price updates, but + at this point we are still in the general analysis of the AGD. This proof + should only mention updates to the current solution (there is no notion of + prices there). + + +B. Confidence (hidden from authors) + +2. Fairly confident: I am fairly familiar with the area and understand the work + reasonably well. + + +A. Overall merit (hidden from authors). + +6. Weak accept: I tend towards acceptance, but leaving it out of the program + would be no great loss. |
