Stoc2015 Review for "Amortized Analysis on Asynchronous Gradient Descent"

* COI Statement: None

Review of paper (visible to author):

* Summary of the problem and paper contribution:

This paper proposes a new model to analyze asynchronous gradient descent for
a convex twice-differentiable function. In this model, each core updates
a single coordinate (or a subset of coordinates) of the current solution.

In most previous models, time was discretized: at each time step, each core
updates its coordinate and sends its update to all the other cores. The next
step can only start when all the cores have completed the current step,
possibly leading to waiting periods where some cores have to wait for the
other cores.

In the proposed model however, time is continuous: each core updates its
coordinate at its own pace and send the updates to the other cores as they are
computed; there is no waiting time. The problem then is that a core might
compute its update rule on outdated data from the other cores (if it hasn't
received their updates yet). To account for this, an assumption of "bounded
asynchrony" is made: the time to communicate an update is always smaller than
the time to compute an update. As a consequence, when computing an update,
a core has the guarantee that the data it is using is never out of date by more
than one round.

Assuming some conditions on the step sizes of the update rule in relation with
the Hessian of the objective function, the authors are able to show that in
this model, the asynchronous gradient descent converges to the global optimum
of the objective function. In the general case, the convergence rate is
sublinear. If the function is strongly convex however, the convergence is
linear. This is the first main theorem of the paper.

Two applications of this result are given:

- inversing linear systems, or more generally, minimizing the least square
  error plus a sum of univariate convex functions. The authors are able to show
  that in this case it is able to choose a step size satisfying the assumptions
  of the main theorem and obtain convergence of the asynchronous gradient
  descent.

- tatonnement in Fisher Markets. A previous result by Cheung, Cole and Devanur
  showed that for buyers with complementary-CES or Leontief utility functions,
  tatonnement is equivalent to gradient descent on a convex function. This was
  used to show that synchronous tatonnement converges to the market
  equilibrium.  Similarly, in this paper, the first main theorem can be used to
  prove that under the same assumptions, asynchronous tatonnement converges to
  the market equilibrium. This is the second main theorem of the paper.


* What is best about the paper: 

The proposed model for asynchrony is nice. The "bounded asynchrony" assumption
seems to be the "right" assumption to make and for which it is still possible
to obtain results. In particular, it allows to see this model as a direct
generalization of the synchronous model.

The proof of convergence requires to address the following difficulty: since
the update rule is computed on possibly outdated data, there is no guarantee
that an update will always result in a decrease in the objective function. The
authors tackled this through amortized analysis: they designed a potential
function which is at least a half of the underlying objective function
and which is always decreasing.

The application to Fisher Markets is significant: it generalizes previous
results on synchronous tatonnement and makes weaker assumptions on the timing
of the buyers' price updates. In particular it does not assume a fixed
schedule.


* What are the paper weaknesses: 

The exposition of the first main theorem lacks clarity. In particular, the
technical details shine a bit too much through the definition of
"controlledness" (Definition 2). It could be interesting to find a simpler set
of assumptions implying the technical ones and which would make applying the
main theorem simpler. For example, the two applications given in the paper seem
to require weaker assumptions than the ones given in Definition 2.

The amortized analysis of the convergence is quite similar to the one found in
Cheung, Cole and Rastogi: Tatonnement in ongoing markets of complementary
goods, and as such is not a technical novelty (even though it is definitely
technically challenging).

It is a bit disappointing that many results on Fisher Markets (for Leontief
utility functions and Ongoing markets) can only be found in the appendix and
thus cannot be considered part of the main body of the paper. Could it make
sense to have a separate paper focusing entirely on asynchronous tatonnement
processes and give them a better treatment? Currently, it feels like they are
treated as second-class citizens.


* Target audience: 

Two kinds of people could directly benefit from this paper:
- people interested in parallelized convex optimization
- people interested in tatonnement processes in Fisher Markets

In particular I hope that this paper will spark a discussion on what the right
assumptions for asynchronous agents in Fisher Markets are. More generally,
I hope this paper will encourage researchers to analyze dynamic systems under
weaker assumptions on their synchrony. This paper is a good example that
results can be obtained without assuming a fixed schedule under fairly general
assumptions.


* Summary of recommendation:

Weak accept. Even though I like the asynchronous model and the results
obtained in this model, I think the paper could benefit from a better
exposition:

- on the technical side, putting some effort on abstracting the assumptions and
simplifying the analysis could make the results easier to apply to other
settings.

- the results on taonnement deserve better treatment and in particular should
not be hidden in the appendix.

* Detailed technical comments:

I only found minor typos and inconsistencies in the notations while reading the
paper:

- the potential phi sometimes take three arguments (p, t, tau) and sometimes
  takes a single argument (t). I think having phi depend only the time t would
  be sufficient (since p and tau are fully determined given the time).

- Lemma 2 page 6 should read: let  \tilde{g} ... the maximum and minimum values
  of \nabla_j \phi(p') (the \phi is missing)

- the notation keeps switching between max (1 / xsi) and 1 / (min xsi)

- the proof of Lemma 2 p 14 expresses the path in terms of price updates, but
  at this point we are still in the general analysis of the AGD. This proof
  should only mention updates to the current solution (there is no notion of
  prices there).


B. Confidence (hidden from authors)

2. Fairly confident: I am fairly familiar with the area and understand the work
   reasonably well.


A. Overall merit (hidden from authors).  

6. Weak accept: I tend towards acceptance, but leaving it out of the program
   would be no great loss.