diff options
Diffstat (limited to 'stoc-2015.txt')
| -rw-r--r-- | stoc-2015.txt | 151 |
1 files changed, 151 insertions, 0 deletions
diff --git a/stoc-2015.txt b/stoc-2015.txt new file mode 100644 index 0000000..771c3a9 --- /dev/null +++ b/stoc-2015.txt @@ -0,0 +1,151 @@ +Stoc2015 Review for "Amortized Analysis on Asynchronous Gradient Descent" + +* COI Statement: None + +Review of paper (visible to author): + +* Summary of the problem and paper contribution: + +This paper proposes a new model to analyze asynchronous gradient descent for +a convex twice-differentiable function. In this model, each core updates +a single coordinate (or a subset of coordinates) of the current solution. + +In most previous models, time was discretized: at each time step, each core +updates its coordinate and sends its update to all the other cores. The next +step can only start when all the cores have completed the current step, +possibly leading to waiting periods where some cores have to wait for the +other cores. + +In the proposed model however, time is continuous: each core updates its +coordinate at its own pace and send the updates to the other cores as they are +computed; there is no waiting time. The problem then is that a core might +compute its update rule on outdated data from the other cores (if it hasn't +received their updates yet). To account for this, an assumption of "bounded +asynchrony" is made: the time to communicate an update is always smaller than +the time to compute an update. As a consequence, when computing an update, +a core has the guarantee that the data it is using is never out of date by more +than one round. + +Assuming some conditions on the step sizes of the update rule in relation with +the Hessian of the objective function, the authors are able to show that in +this model, the asynchronous gradient descent converges to the global optimum +of the objective function. In the general case, the convergence rate is +sublinear. If the function is strongly convex however, the convergence is +linear. This is the first main theorem of the paper. + +Two applications of this result are given: + +- inversing linear systems, or more generally, minimizing the least square + error plus a sum of univariate convex functions. The authors are able to show + that in this case it is able to choose a step size satisfying the assumptions + of the main theorem and obtain convergence of the asynchronous gradient + descent. + +- tatonnement in Fisher Markets. A previous result by Cheung, Cole and Devanur + showed that for buyers with complementary-CES or Leontief utility functions, + tatonnement is equivalent to gradient descent on a convex function. This was + used to show that synchronous tatonnement converges to the market + equilibrium. Similarly, in this paper, the first main theorem can be used to + prove that under the same assumptions, asynchronous tatonnement converges to + the market equilibrium. This is the second main theorem of the paper. + + +* What is best about the paper: + +The proposed model for asynchrony is nice. The "bounded asynchrony" assumption +seems to be the "right" assumption to make and for which it is still possible +to obtain results. In particular, it allows to see this model as a direct +generalization of the synchronous model. + +The proof of convergence requires to address the following difficulty: since +the update rule is computed on possibly outdated data, there is no guarantee +that an update will always result in a decrease in the objective function. The +authors tackled this through amortized analysis: they designed a potential +function which is at least a half of the underlying objective function +and which is always decreasing. + +The application to Fisher Markets is significant: it generalizes previous +results on synchronous tatonnement and makes weaker assumptions on the timing +of the buyers' price updates. In particular it does not assume a fixed +schedule. + + +* What are the paper weaknesses: + +The exposition of the first main theorem lacks clarity. In particular, the +technical details shine a bit too much through the definition of +"controlledness" (Definition 2). It could be interesting to find a simpler set +of assumptions implying the technical ones and which would make applying the +main theorem simpler. For example, the two applications given in the paper seem +to require weaker assumptions than the ones given in Definition 2. + +The amortized analysis of the convergence is quite similar to the one found in +Cheung, Cole and Rastogi: Tatonnement in ongoing markets of complementary +goods, and as such is not a technical novelty (even though it is definitely +technically challenging). + +It is a bit disappointing that many results on Fisher Markets (for Leontief +utility functions and Ongoing markets) can only be found in the appendix and +thus cannot be considered part of the main body of the paper. Could it make +sense to have a separate paper focusing entirely on asynchronous tatonnement +processes and give them a better treatment? Currently, it feels like they are +treated as second-class citizens. + + +* Target audience: + +Two kinds of people could directly benefit from this paper: +- people interested in parallelized convex optimization +- people interested in tatonnement processes in Fisher Markets + +In particular I hope that this paper will spark a discussion on what the right +assumptions for asynchronous agents in Fisher Markets are. More generally, +I hope this paper will encourage researchers to analyze dynamic systems under +weaker assumptions on their synchrony. This paper is a good example that +results can be obtained without assuming a fixed schedule under fairly general +assumptions. + + +* Summary of recommendation: + +Weak accept. Even though I like the asynchronous model and the results +obtained in this model, I think the paper could benefit from a better +exposition: + +- on the technical side, putting some effort on abstracting the assumptions and +simplifying the analysis could make the results easier to apply to other +settings. + +- the results on taonnement deserve better treatment and in particular should +not be hidden in the appendix. + +* Detailed technical comments: + +I only found minor typos and inconsistencies in the notations while reading the +paper: + +- the potential phi sometimes take three arguments (p, t, tau) and sometimes + takes a single argument (t). I think having phi depend only the time t would + be sufficient (since p and tau are fully determined given the time). + +- Lemma 2 page 6 should read: let \tilde{g} ... the maximum and minimum values + of \nabla_j \phi(p') (the \phi is missing) + +- the notation keeps switching between max (1 / xsi) and 1 / (min xsi) + +- the proof of Lemma 2 p 14 expresses the path in terms of price updates, but + at this point we are still in the general analysis of the AGD. This proof + should only mention updates to the current solution (there is no notion of + prices there). + + +B. Confidence (hidden from authors) + +2. Fairly confident: I am fairly familiar with the area and understand the work + reasonably well. + + +A. Overall merit (hidden from authors). + +6. Weak accept: I tend towards acceptance, but leaving it out of the program + would be no great loss. |
