1 files changed, 151 insertions, 0 deletions
diff --git a/stoc-2015.txt b/stoc-2015.txt
new file mode 100644
index 0000000..771c3a9
--- /dev/null
+++ b/stoc-2015.txt
@@ -0,0 +1,151 @@
+Stoc2015 Review for "Amortized Analysis on Asynchronous Gradient Descent"
+
+* COI Statement: None
+
+Review of paper (visible to author):
+
+* Summary of the problem and paper contribution:
+
+This paper proposes a new model to analyze asynchronous gradient descent for
+a convex twice-differentiable function. In this model, each core updates
+a single coordinate (or a subset of coordinates) of the current solution.
+
+In most previous models, time was discretized: at each time step, each core
+updates its coordinate and sends its update to all the other cores. The next
+step can only start when all the cores have completed the current step,
+possibly leading to waiting periods where some cores have to wait for the
+other cores.
+
+In the proposed model however, time is continuous: each core updates its
+coordinate at its own pace and send the updates to the other cores as they are
+computed; there is no waiting time. The problem then is that a core might
+compute its update rule on outdated data from the other cores (if it hasn't
+received their updates yet). To account for this, an assumption of "bounded
+asynchrony" is made: the time to communicate an update is always smaller than
+the time to compute an update. As a consequence, when computing an update,
+a core has the guarantee that the data it is using is never out of date by more
+than one round.
+
+Assuming some conditions on the step sizes of the update rule in relation with
+the Hessian of the objective function, the authors are able to show that in
+this model, the asynchronous gradient descent converges to the global optimum
+of the objective function. In the general case, the convergence rate is
+sublinear. If the function is strongly convex however, the convergence is
+linear. This is the first main theorem of the paper.
+
+Two applications of this result are given:
+
+- inversing linear systems, or more generally, minimizing the least square
+  error plus a sum of univariate convex functions. The authors are able to show
+  that in this case it is able to choose a step size satisfying the assumptions
+  of the main theorem and obtain convergence of the asynchronous gradient
+  descent.
+
+- tatonnement in Fisher Markets. A previous result by Cheung, Cole and Devanur
+  showed that for buyers with complementary-CES or Leontief utility functions,
+  tatonnement is equivalent to gradient descent on a convex function. This was
+  used to show that synchronous tatonnement converges to the market
+  equilibrium.  Similarly, in this paper, the first main theorem can be used to
+  prove that under the same assumptions, asynchronous tatonnement converges to
+  the market equilibrium. This is the second main theorem of the paper.
+
+
+* What is best about the paper: 
+
+The proposed model for asynchrony is nice. The "bounded asynchrony" assumption
+seems to be the "right" assumption to make and for which it is still possible
+to obtain results. In particular, it allows to see this model as a direct
+generalization of the synchronous model.
+
+The proof of convergence requires to address the following difficulty: since
+the update rule is computed on possibly outdated data, there is no guarantee
+that an update will always result in a decrease in the objective function. The
+authors tackled this through amortized analysis: they designed a potential
+function which is at least a half of the underlying objective function
+and which is always decreasing.
+
+The application to Fisher Markets is significant: it generalizes previous
+results on synchronous tatonnement and makes weaker assumptions on the timing
+of the buyers' price updates. In particular it does not assume a fixed
+schedule.
+
+
+* What are the paper weaknesses: 
+
+The exposition of the first main theorem lacks clarity. In particular, the
+technical details shine a bit too much through the definition of
+"controlledness" (Definition 2). It could be interesting to find a simpler set
+of assumptions implying the technical ones and which would make applying the
+main theorem simpler. For example, the two applications given in the paper seem
+to require weaker assumptions than the ones given in Definition 2.
+
+The amortized analysis of the convergence is quite similar to the one found in
+Cheung, Cole and Rastogi: Tatonnement in ongoing markets of complementary
+goods, and as such is not a technical novelty (even though it is definitely
+technically challenging).
+
+It is a bit disappointing that many results on Fisher Markets (for Leontief
+utility functions and Ongoing markets) can only be found in the appendix and
+thus cannot be considered part of the main body of the paper. Could it make
+sense to have a separate paper focusing entirely on asynchronous tatonnement
+processes and give them a better treatment? Currently, it feels like they are
+treated as second-class citizens.
+
+
+* Target audience: 
+
+Two kinds of people could directly benefit from this paper:
+- people interested in parallelized convex optimization
+- people interested in tatonnement processes in Fisher Markets
+
+In particular I hope that this paper will spark a discussion on what the right
+assumptions for asynchronous agents in Fisher Markets are. More generally,
+I hope this paper will encourage researchers to analyze dynamic systems under
+weaker assumptions on their synchrony. This paper is a good example that
+results can be obtained without assuming a fixed schedule under fairly general
+assumptions.
+
+
+* Summary of recommendation:
+
+Weak accept. Even though I like the asynchronous model and the results
+obtained in this model, I think the paper could benefit from a better
+exposition:
+
+- on the technical side, putting some effort on abstracting the assumptions and
+simplifying the analysis could make the results easier to apply to other
+settings.
+
+- the results on taonnement deserve better treatment and in particular should
+not be hidden in the appendix.
+
+* Detailed technical comments:
+
+I only found minor typos and inconsistencies in the notations while reading the
+paper:
+
+- the potential phi sometimes take three arguments (p, t, tau) and sometimes
+  takes a single argument (t). I think having phi depend only the time t would
+  be sufficient (since p and tau are fully determined given the time).
+
+- Lemma 2 page 6 should read: let  \tilde{g} ... the maximum and minimum values
+  of \nabla_j \phi(p') (the \phi is missing)
+
+- the notation keeps switching between max (1 / xsi) and 1 / (min xsi)
+
+- the proof of Lemma 2 p 14 expresses the path in terms of price updates, but
+  at this point we are still in the general analysis of the AGD. This proof
+  should only mention updates to the current solution (there is no notion of
+  prices there).
+
+
+B. Confidence (hidden from authors)
+
+2. Fairly confident: I am fairly familiar with the area and understand the work
+   reasonably well.
+
+
+A. Overall merit (hidden from authors).  
+
+6. Weak accept: I tend towards acceptance, but leaving it out of the program
+   would be no great loss.