jmlr-18-385.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155

\documentclass[12pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[hmargin=1.2in, vmargin=1.2in]{geometry}

\title{Review of \emph{Approximate Submodular Functions and Performance
Guarantees}}
\date{}
\begin{document}

\maketitle

\vspace{-3em}

\paragraph{Summary.} This paper studies $\delta$-approximate submodular
functions: these are functions $f$ for which there exists a submodular function
whose ``marginal gains'' approximate the marginal gains of $f$ within
a multiplicative factor of $(1\pm\delta)$. The contributions are:
\begin{itemize}
	\item this notion of approximate submodularity is related to the already
		existing notion of ``submodularity ratio'': there is a necessary condition on
		$\delta$ (involving the submodularity ratio) for a function to be
		$\delta$-approximate submodular.
	\item an approximation bound on the value attained by the greedy algorithm
		when maximizing a $\delta$-approximate submodular function is derived.
	\item two functions are shown to be $\delta$-approximately submodular:
		the trace of inverse Gramian matrix of observations (which is an
		important quantity in A-optimal experiment design) and the minimum
		eigenvalue of the Gramian.
	\item an experiment shows that the approximation bound derived for
		$\delta$-submodular functions compares favorably to the one by (Bian et
		al., 2017) using the notions of submodularity ratio and curvature.
	\item a second experiment shows how to use the notion of
		$\delta$-submodularity to solve a problem of sensor selection.
\end{itemize}


\paragraph{General comments.} The general topic of approximate submodularity is
becoming increasingly relevant to explain why the greedy heuristic often works
well to maximize non-submodular functions. An important line of work has
focused on the notion of submodularity ratio to measure the divergence from
submodularity. The notion of $\delta$-approximate submodular notion introduced
in this paper proposes an interesting new perspective on the same question.
This definition is justified by the fact that the authors were able to identify
two functions satisfying it. Furthermore this notion has the following
advantage compared to the notion of submodularity ratio: once a submodular $g$
function has been identified to approximate a given non-submodular function
$f$, one can understand the approximation guarantee of the greedy algorithm
simply by looking at properties of $g$, which is typically easier than
computing the submodularity ratio (which takes exponential time). However both
the theoretical results as well as the experimental results and the exposition
in this paper suffer from inaccuracies and gaps that we now describe.

\paragraph{ROS is not a characterization.} The authors claim that the Region of
Submodularity (ROS) characterizes when a function is $\delta$-submodular.
However, Lemma 1 only gives a \emph{necessary} condition on $\delta$ for
a function to be $\delta$-submodular. A complete characterization would also
require to show that for any $f$ and $\delta$ satisfying the bound of Lemma 1,
there exists indeed a submodular function which $\delta$-approximates $f$; but
this fact seems false to me or at least hard to prove. For this reason,
I believe that the notion of region of submodularity region is not very
relevant and is a complicated way to express that if a function is
$\delta$-submodular, then it implies a lower bound on its submodularity ratio.
This fact which is important and interesting but I do not believe that the ROS
is the best way to express it (more on this below).

\paragraph{The bound of Theorem 3 is not tighter than the bound of Theorem 2.}
The authors claim at several points in their paper that the performance
guarantee in Theorem 3 is tighter than the one in Theorem 2 and this is in fact
the purpose of the experiment in section 5.1 to illustrate it. However,
I believe that the bound in Theorem 3 is always worse than the one in Theorem
2, this is made apparent by looking at the proof of both theorems.  The proofs
are nearly identical, but Theorem 2 uses $\delta$-submodularity to obtain (1)
a lower bound on the submodularity ratio (2) an upper bound on the curvature.
Since the performance guarantee in Theorem 2 is increasing in the submodularity
ratio and decreasing in the curvature, then the guarantee in Theorem 3 is
implied by the one in Theorem 2 (see also my comments below on how to improve
exposition). As a consequence, I am puzzled by the plot obtained in section 5.1
since it seems to contradict the theory. Very few details are given on how the
submodularity ratio and curvature were computed, but maybe they weren't
computed properly?

\paragraph{Computing the performance guarantee does not seem relevant.} The
authors make the following good observation: it is much easier to compute the
performance guarantee given by $\delta$-submodularity rather than to compute the
parameters $\alpha$ and $\gamma$. So even if the performance guarantee is not
as tight, once a $\delta$-approximation has been found, it provides
a convenient way to compute a lower-bound on the approximation guarantee.
However, being able to compute the approximation bounds does not seem relevant
in practice: what truly matters is how well the greedy algorithm performs in
practice, not what is the (worst case) lower bound provided by the theory.

\paragraph{Experiment 5.2} It is not clear how $\delta$-approximate
submodularity was used in Experiment 5.2: reading the text, it seems that the
authors simply performed the task by using a greedy heuristic to maximize the
minimum eigenvalue of the Gramian; this objective function is not submodular.
As such, the experiment does not seem to illustrate a point relevant to this
paper (but simply shows that using greedy selection on this non-submodular
objective performs better than random).

\paragraph{Suggestions to rewrite section 3.} Following some of the above
comments, I believe that Section 3 could benefit a lot from being rewritten in
the following way:
\begin{enumerate}
	\item restate Lemma 1 as follows: ``If $f$ is $\delta$-submodular, then
		$\gamma_f\geq \frac{1-\delta}{1+\delta}$''. It is equivalent to, but
		much clearer than the current statement. Also, as a more minor comment,
		the proof of Lemma 1 does not need to be by contradiction and a direct
		proof would be clearer:
		\begin{displaymath}
			1 \leq \min_{S,T}\frac{\sum_{t\in T\setminus S} g_S(t)}{g_S(T)}
			\leq \frac{1+\delta}{1-\delta}\min_{S,T}\frac{\sum_{t\in T\setminus
			S} f_S(t)}{f_S(T)}
			= \gamma_f\cdot\frac{1+\delta}{1-\delta}
		\end{displaymath}
		where the first inequality is by submodularity and the rest is by
		definition.
	\item extract the second half of the proof of Theorem 3 in the appendix as
		the following Lemma 2: ``let $f$ be monotone and $g$ be monotone
		submodular such that $d(f, g)\leq \delta)$, then $\alpha_G(f)\leq
		1 - \frac{1-\delta}{1+\delta}\big(1-\alpha_T(g)\big)$''.
	\item present Theorem 2 with the simplified proof using the improved notion
		of greedy curvature.
	\item present Theorem 3 as a corollary of Theorem 2: by combining Lemma
		1 and Lemma 2 from the first two points: if $d(f, g)\leq \delta$, then
		  we have a lower bound on the submodularity ratio and an upper bound
		  on the curvature than we can plug in Theorem 2.
	\item remove all mentions of the ROS: instead the whole benefit of this
		notion can be summarized by the following remark which can go below
		Theorem 3: ``The bound of Theorem 3 shows that if the same function $f$
		can be $\delta$-approximated by multiple submodular functions, then one
		should favor the one with the smallest curvature''.
\end{enumerate}

\paragraph{Minor comments on the writing.} There are many statements of the
form ``$f$ can be represented as/by a $\delta$-approximate submodular function'',
but the definition of $\delta$-approximate submodular function alreadys
contains the fact that $f$ is represented by a function, so I think it would be
cleaner to say ``$f$ is $\delta$-approximate submodular'' (or make the
representation explicit when needed and say ``let $g$ be a submodular function
such that $d(f, g) \leq \delta$''). Furthermore I noticed a non-trivial amount
of grammatical mistakes as I was reading and I would suggest doing more
proofreading passes.

\paragraph{Conclusion.} While I believe that the notion of
$\delta$-submodularity is an interesting complement to the already existing
notion of submodularity ratio, I think the paper would benefit a lot from
a significant rework: both a rewrite of the Section 3 as suggested above, but
also of the experiment section to fix the afore-mentioned problems. I also
believe that the contributions of this paper, while valuable are a bit too
incremental (in particular, compared to Bian et al, 2017) to justify
a publication in JMLR: the suggested rewrite of Section 3 makes it apparent
that Theorem 3 is a corollary of Theorem 2 and does not provide a tighter
bound.
\end{document}