summaryrefslogtreecommitdiffstats
path: root/icml-2016-1111.xml
blob: 879d86b2ec139ee3694c37a27e18866c3ce74886 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
<?xml version="1.0" encoding="UTF-8"?>
<conference ShortName="ICML2016" ReviewDeadline="Please see Conference Website">
  <paper PaperId="454" Title="Scaling Submodular Maximization on Pruned Submodularity Graph">
    <question Number="1" Text="Summary of the paper (Summarize the main claims/contributions of the paper.)">
        <YourAnswer>
This paper studies the problem of optimizing an approximately convex function,
that is, one which is within additive approximation delta of a convex
function). For a given accuracy epsilon, the goal is to obtain a solution whose
value is within an additive epsilon of the optimal value in time polynomial in
the dimension d and 1/epsilon. The paper considers zero-order optimization, in
which the function is only accessed through value queries (for example, it is
not assumed that the gradient can be computed; it might not even exist since
the approximately convex function could even be discontinuous)

It is intuitively clear that as delta grows larger compared to epsilon, the
problem becomes harder. More precisely, the goal is to find a threshold
function T, such that when delta = O(T(epsilon)), then the problem is solvable
in polynomial time and when delta = Omega(T(epsilon)), the problem is not
solvable in polynomial time (the problem is always solvable in exponential time
by considering a square grid of width 1/epsilon).

The authors show that this function is T(epsilon) = max(epsilon^2/sqrt{d},
epsilon/d). More precisely:

* they provide an information theoretic lower bound, showing that when delta
= Omega(T(epsilon)) (up to logarithmic factor), no algorithm making
polynomially evaluations of the function can optimize it to precision epsilon.
The lower bound relies on the careful construction of a family of functions
defined on the unit ball which behaves like ||x||^{1+alpha} unless x lies in
a small angle around a random chosen direction. In this small angle, the
function can take significantly smaller values, but with very high probability,
an algorithm never evaluates the function in this small angle.

* they give an algorithm which provably finds an epsilon-approximate solution
in the regime where delta = Omega(epsilon/d) and delta = O(epsilon^2/d).
Together with a previous algorithm from Belloni et al. in the regime delta
= O(epsilon/d), this completes the algorithmic upper bound. Their algorithm
uses a natural idea from Flaxman et al., where the gradient of the underlying
convex function at some point x is estimated by sampling points in a ball
around x. The algorithm is then a gradient descent using this estimated
gradient. The analysis relies on showing that even with a delta-approximately
convex function, this way of estimating the gradient still provides
a sufficiently good descent direction.
        </YourAnswer>
    </question>
    <question Number="2" Text="Clarity (Assess the clarity of the presentation and reproducibility of the results.)">
      <PossibleAnswer>Excellent (Easy to follow)</PossibleAnswer>
      <PossibleAnswer>Above Average</PossibleAnswer>
      <PossibleAnswer>Below Average</PossibleAnswer>
      <PossibleAnswer>Poor (Hard to follow)</PossibleAnswer>
      <YourAnswer>Above Average</YourAnswer>
    </question>
    <question Number="3" Text="Clarity - Justification">
        <YourAnswer>
The paper is very clearly written and the overall structure is easy to follow.
A lot of care was given in precisely stating the propositions and the theorems
so that the dependencies of the bounds in all the parameters can easily be
tracked.

There are two places where the paper could have benefited from more
explanations:

* Construction 4.1, it is not clear why \tilde{h} should be replaced by the
lower convex envelope of \tilde{h}, especially since \tilde{h} itself is
already convex.

* Beginning of the proof of Lemma 5.1, the argument why the curvature of the
boundary of K can be assumed finite wlog is not immediate to me.
        </YourAnswer>
    </question>
    <question Number="4" Text="Significance (Does the paper contribute a major breakthrough or an incremental advance?)">
      <PossibleAnswer>Excellent (substantial, novel contribution)</PossibleAnswer>
      <PossibleAnswer>Above Average</PossibleAnswer>
      <PossibleAnswer>Below Average</PossibleAnswer>
      <PossibleAnswer>Poor (minimal or no contribution)</PossibleAnswer>
      <YourAnswer>Excellent (substantial, novel contribution)</YourAnswer>
    </question>
    <question Number="5" Text="Significance - Justification">
        <YourAnswer>
This paper makes a significant contribution to the question of zero-order
optimization of approximately convex function by essentially closing the gap
between information-theoretic lower bounds and algorithmic upper bounds.

What was previously known:

* a tight epsilon/sqrt{d} threshold in the case where the underlying
convex function is smooth (the upper bound coming from Dyer et al. and the
lower bound coming from Singer et al.)

* an epsilon/d algorithmic upper bound for general (non-smooth) functions from
Belloni et al.

The construction of the information theoretic lower bound is novel and
non-trivial. While the algorithm is inspired by Flaxman et al., its analysis
for approximately convex functions is novel. 
        </YourAnswer>
    </question>
    <question Number="6" Text="Detailed comments. (Explain the basis for your ratings while providing constructive feedback.)">
        <YourAnswer>
As already stated, the paper makes a substantial and novel contribution to the
field of zero-order optimization of approximately convex functions (which, as
the authors point out, had very few theoretical guarantees until recently).

As far as I was able to verify the results are correct. I think my comments
about clarity should be addressed for the camera-ready version, but I do not
believe they affect the validity of the results. Overall, I stronly support
this paper.

Typo on line 297: I think it should read \tilde{f}(x) = f(x) otherwise (instead
of \tilde{f}(x) = x)
        </YourAnswer>
    </question>
    <question Number="7" Text="Overall Rating">
      <PossibleAnswer>Strong accept</PossibleAnswer>
      <PossibleAnswer>Weak accept</PossibleAnswer>
      <PossibleAnswer>Weak reject</PossibleAnswer>
      <PossibleAnswer>Strong reject</PossibleAnswer>
      <YourAnswer>Strong accept</YourAnswer>
    </question>
    <question Number="8" Text="Reviewer confidence">
      <PossibleAnswer>Reviewer is an expert</PossibleAnswer>
      <PossibleAnswer>Reviewer is knowledgeable</PossibleAnswer>
      <PossibleAnswer>Reviewer's evaluation is an educated guess</PossibleAnswer>
      <YourAnswer>Reviewer is knowledgeable</YourAnswer>
    </question>
    <question Number="9" Text="Confidential Comments (not visible to authors)">
      <YourAnswer></YourAnswer>
    </question>
  </paper>
</conference>