From 98fb1b781dd89357fa36f7835d96749dcae4da4f Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Tue, 15 Mar 2016 17:05:57 -0400 Subject: Add ICML 2016 review --- icml-2016-1111.xml | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) create mode 100644 icml-2016-1111.xml diff --git a/icml-2016-1111.xml b/icml-2016-1111.xml new file mode 100644 index 0000000..879d86b --- /dev/null +++ b/icml-2016-1111.xml @@ -0,0 +1,130 @@ + + + + + +This paper studies the problem of optimizing an approximately convex function, +that is, one which is within additive approximation delta of a convex +function). For a given accuracy epsilon, the goal is to obtain a solution whose +value is within an additive epsilon of the optimal value in time polynomial in +the dimension d and 1/epsilon. The paper considers zero-order optimization, in +which the function is only accessed through value queries (for example, it is +not assumed that the gradient can be computed; it might not even exist since +the approximately convex function could even be discontinuous) + +It is intuitively clear that as delta grows larger compared to epsilon, the +problem becomes harder. More precisely, the goal is to find a threshold +function T, such that when delta = O(T(epsilon)), then the problem is solvable +in polynomial time and when delta = Omega(T(epsilon)), the problem is not +solvable in polynomial time (the problem is always solvable in exponential time +by considering a square grid of width 1/epsilon). + +The authors show that this function is T(epsilon) = max(epsilon^2/sqrt{d}, +epsilon/d). More precisely: + +* they provide an information theoretic lower bound, showing that when delta += Omega(T(epsilon)) (up to logarithmic factor), no algorithm making +polynomially evaluations of the function can optimize it to precision epsilon. +The lower bound relies on the careful construction of a family of functions +defined on the unit ball which behaves like ||x||^{1+alpha} unless x lies in +a small angle around a random chosen direction. In this small angle, the +function can take significantly smaller values, but with very high probability, +an algorithm never evaluates the function in this small angle. + +* they give an algorithm which provably finds an epsilon-approximate solution +in the regime where delta = Omega(epsilon/d) and delta = O(epsilon^2/d). +Together with a previous algorithm from Belloni et al. in the regime delta += O(epsilon/d), this completes the algorithmic upper bound. Their algorithm +uses a natural idea from Flaxman et al., where the gradient of the underlying +convex function at some point x is estimated by sampling points in a ball +around x. The algorithm is then a gradient descent using this estimated +gradient. The analysis relies on showing that even with a delta-approximately +convex function, this way of estimating the gradient still provides +a sufficiently good descent direction. + + + + Excellent (Easy to follow) + Above Average + Below Average + Poor (Hard to follow) + Above Average + + + +The paper is very clearly written and the overall structure is easy to follow. +A lot of care was given in precisely stating the propositions and the theorems +so that the dependencies of the bounds in all the parameters can easily be +tracked. + +There are two places where the paper could have benefited from more +explanations: + +* Construction 4.1, it is not clear why \tilde{h} should be replaced by the +lower convex envelope of \tilde{h}, especially since \tilde{h} itself is +already convex. + +* Beginning of the proof of Lemma 5.1, the argument why the curvature of the +boundary of K can be assumed finite wlog is not immediate to me. + + + + Excellent (substantial, novel contribution) + Above Average + Below Average + Poor (minimal or no contribution) + Excellent (substantial, novel contribution) + + + +This paper makes a significant contribution to the question of zero-order +optimization of approximately convex function by essentially closing the gap +between information-theoretic lower bounds and algorithmic upper bounds. + +What was previously known: + +* a tight epsilon/sqrt{d} threshold in the case where the underlying +convex function is smooth (the upper bound coming from Dyer et al. and the +lower bound coming from Singer et al.) + +* an epsilon/d algorithmic upper bound for general (non-smooth) functions from +Belloni et al. + +The construction of the information theoretic lower bound is novel and +non-trivial. While the algorithm is inspired by Flaxman et al., its analysis +for approximately convex functions is novel. + + + + +As already stated, the paper makes a substantial and novel contribution to the +field of zero-order optimization of approximately convex functions (which, as +the authors point out, had very few theoretical guarantees until recently). + +As far as I was able to verify the results are correct. I think my comments +about clarity should be addressed for the camera-ready version, but I do not +believe they affect the validity of the results. Overall, I stronly support +this paper. + +Typo on line 297: I think it should read \tilde{f}(x) = f(x) otherwise (instead +of \tilde{f}(x) = x) + + + + Strong accept + Weak accept + Weak reject + Strong reject + Strong accept + + + Reviewer is an expert + Reviewer is knowledgeable + Reviewer's evaluation is an educated guess + Reviewer is knowledgeable + + + + + + -- cgit v1.2.3-70-g09d2