This paper studies the problem of optimizing an approximately convex function, that is, one which is within additive approximation delta of a convex function). For a given accuracy epsilon, the goal is to obtain a solution whose value is within an additive epsilon of the optimal value in time polynomial in the dimension d and 1/epsilon. The paper considers zero-order optimization, in which the function is only accessed through value queries (for example, it is not assumed that the gradient can be computed; it might not even exist since the approximately convex function could even be discontinuous) It is intuitively clear that as delta grows larger compared to epsilon, the problem becomes harder. More precisely, the goal is to find a threshold function T, such that when delta = O(T(epsilon)), then the problem is solvable in polynomial time and when delta = Omega(T(epsilon)), the problem is not solvable in polynomial time (the problem is always solvable in exponential time by considering a square grid of width 1/epsilon). The authors show that this function is T(epsilon) = max(epsilon^2/sqrt{d}, epsilon/d). More precisely: * they provide an information theoretic lower bound, showing that when delta = Omega(T(epsilon)) (up to logarithmic factor), no algorithm making polynomially evaluations of the function can optimize it to precision epsilon. The lower bound relies on the careful construction of a family of functions defined on the unit ball which behaves like ||x||^{1+alpha} unless x lies in a small angle around a random chosen direction. In this small angle, the function can take significantly smaller values, but with very high probability, an algorithm never evaluates the function in this small angle. * they give an algorithm which provably finds an epsilon-approximate solution in the regime where delta = Omega(epsilon/d) and delta = O(epsilon^2/d). Together with a previous algorithm from Belloni et al. in the regime delta = O(epsilon/d), this completes the algorithmic upper bound. Their algorithm uses a natural idea from Flaxman et al., where the gradient of the underlying convex function at some point x is estimated by sampling points in a ball around x. The algorithm is then a gradient descent using this estimated gradient. The analysis relies on showing that even with a delta-approximately convex function, this way of estimating the gradient still provides a sufficiently good descent direction. Excellent (Easy to follow) Above Average Below Average Poor (Hard to follow) Above Average The paper is very clearly written and the overall structure is easy to follow. A lot of care was given in precisely stating the propositions and the theorems so that the dependencies of the bounds in all the parameters can easily be tracked. There are two places where the paper could have benefited from more explanations: * Construction 4.1, it is not clear why \tilde{h} should be replaced by the lower convex envelope of \tilde{h}, especially since \tilde{h} itself is already convex. * Beginning of the proof of Lemma 5.1, the argument why the curvature of the boundary of K can be assumed finite wlog is not immediate to me. Excellent (substantial, novel contribution) Above Average Below Average Poor (minimal or no contribution) Excellent (substantial, novel contribution) This paper makes a significant contribution to the question of zero-order optimization of approximately convex function by essentially closing the gap between information-theoretic lower bounds and algorithmic upper bounds. What was previously known: * a tight epsilon/sqrt{d} threshold in the case where the underlying convex function is smooth (the upper bound coming from Dyer et al. and the lower bound coming from Singer et al.) * an epsilon/d algorithmic upper bound for general (non-smooth) functions from Belloni et al. The construction of the information theoretic lower bound is novel and non-trivial. While the algorithm is inspired by Flaxman et al., its analysis for approximately convex functions is novel. As already stated, the paper makes a substantial and novel contribution to the field of zero-order optimization of approximately convex functions (which, as the authors point out, had very few theoretical guarantees until recently). As far as I was able to verify the results are correct. I think my comments about clarity should be addressed for the camera-ready version, but I do not believe they affect the validity of the results. Overall, I stronly support this paper. Typo on line 297: I think it should read \tilde{f}(x) = f(x) otherwise (instead of \tilde{f}(x) = x) Strong accept Weak accept Weak reject Strong reject Strong accept Reviewer is an expert Reviewer is knowledgeable Reviewer's evaluation is an educated guess Reviewer is knowledgeable