Stoc2015 Review for "Amortized Analysis on Asynchronous Gradient Descent" * COI Statement: None Review of paper (visible to author): * Summary of the problem and paper contribution: This paper proposes a new model to analyze asynchronous gradient descent for a convex twice-differentiable function. In this model, each core updates a single coordinate (or a subset of coordinates) of the current solution. In most previous models, time was discretized: at each time step, each core updates its coordinate and sends its update to all the other cores. The next step can only start when all the cores have completed the current step, possibly leading to waiting periods where some cores have to wait for the other cores. In the proposed model however, time is continuous: each core updates its coordinate at its own pace and send the updates to the other cores as they are computed; there is no waiting time. The problem then is that a core might compute its update rule on outdated data from the other cores (if it hasn't received their updates yet). To account for this, an assumption of "bounded asynchrony" is made: the time to communicate an update is always smaller than the time to compute an update. As a consequence, when computing an update, a core has the guarantee that the data it is using is never out of date by more than one round. Assuming some conditions on the step sizes of the update rule in relation with the Hessian of the objective function, the authors are able to show that in this model, the asynchronous gradient descent converges to the global optimum of the objective function. In the general case, the convergence rate is sublinear. If the function is strongly convex however, the convergence is linear. This is the first main theorem of the paper. Two applications of this result are given: - inversing linear systems, or more generally, minimizing the least square error plus a sum of univariate convex functions. The authors are able to show that in this case it is able to choose a step size satisfying the assumptions of the main theorem and obtain convergence of the asynchronous gradient descent. - tatonnement in Fisher Markets. A previous result by Cheung, Cole and Devanur showed that for buyers with complementary-CES or Leontief utility functions, tatonnement is equivalent to gradient descent on a convex function. This was used to show that synchronous tatonnement converges to the market equilibrium. Similarly, in this paper, the first main theorem can be used to prove that under the same assumptions, asynchronous tatonnement converges to the market equilibrium. This is the second main theorem of the paper. * What is best about the paper: The proposed model for asynchrony is nice. The "bounded asynchrony" assumption seems to be the "right" assumption to make and for which it is still possible to obtain results. In particular, it allows to see this model as a direct generalization of the synchronous model. The proof of convergence requires to address the following difficulty: since the update rule is computed on possibly outdated data, there is no guarantee that an update will always result in a decrease in the objective function. The authors tackled this through amortized analysis: they designed a potential function which is at least a half of the underlying objective function and which is always decreasing. The application to Fisher Markets is significant: it generalizes previous results on synchronous tatonnement and makes weaker assumptions on the timing of the buyers' price updates. In particular it does not assume a fixed schedule. * What are the paper weaknesses: The exposition of the first main theorem lacks clarity. In particular, the technical details shine a bit too much through the definition of "controlledness" (Definition 2). It could be interesting to find a simpler set of assumptions implying the technical ones and which would make applying the main theorem simpler. For example, the two applications given in the paper seem to require weaker assumptions than the ones given in Definition 2. The amortized analysis of the convergence is quite similar to the one found in Cheung, Cole and Rastogi: Tatonnement in ongoing markets of complementary goods, and as such is not a technical novelty (even though it is definitely technically challenging). It is a bit disappointing that many results on Fisher Markets (for Leontief utility functions and Ongoing markets) can only be found in the appendix and thus cannot be considered part of the main body of the paper. Could it make sense to have a separate paper focusing entirely on asynchronous tatonnement processes and give them a better treatment? Currently, it feels like they are treated as second-class citizens. * Target audience: Two kinds of people could directly benefit from this paper: - people interested in parallelized convex optimization - people interested in tatonnement processes in Fisher Markets In particular I hope that this paper will spark a discussion on what the right assumptions for asynchronous agents in Fisher Markets are. More generally, I hope this paper will encourage researchers to analyze dynamic systems under weaker assumptions on their synchrony. This paper is a good example that results can be obtained without assuming a fixed schedule under fairly general assumptions. * Summary of recommendation: Weak accept. Even though I like the asynchronous model and the results obtained in this model, I think the paper could benefit from a better exposition: - on the technical side, putting some effort on abstracting the assumptions and simplifying the analysis could make the results easier to apply to other settings. - the results on taonnement deserve better treatment and in particular should not be hidden in the appendix. * Detailed technical comments: I only found minor typos and inconsistencies in the notations while reading the paper: - the potential phi sometimes take three arguments (p, t, tau) and sometimes takes a single argument (t). I think having phi depend only the time t would be sufficient (since p and tau are fully determined given the time). - Lemma 2 page 6 should read: let \tilde{g} ... the maximum and minimum values of \nabla_j \phi(p') (the \phi is missing) - the notation keeps switching between max (1 / xsi) and 1 / (min xsi) - the proof of Lemma 2 p 14 expresses the path in terms of price updates, but at this point we are still in the general analysis of the AGD. This proof should only mention updates to the current solution (there is no notion of prices there). B. Confidence (hidden from authors) 2. Fairly confident: I am fairly familiar with the area and understand the work reasonably well. A. Overall merit (hidden from authors). 6. Weak accept: I tend towards acceptance, but leaving it out of the program would be no great loss.