1 files changed, 155 insertions, 0 deletions
diff --git a/jmlr-18-385.tex b/jmlr-18-385.tex
new file mode 100644
index 0000000..6a38bff
--- /dev/null
+++ b/jmlr-18-385.tex
@@ -0,0 +1,155 @@
+\documentclass[12pt]{article}
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[hmargin=1.2in, vmargin=1.2in]{geometry}
+
+\title{Review of \emph{Approximate Submodular Functions and Performance
+Guarantees}}
+\date{}
+\begin{document}
+
+\maketitle
+
+\vspace{-3em}
+
+\paragraph{Summary.} This paper studies $\delta$-approximate submodular
+functions: these are functions $f$ for which there exists a submodular function
+whose ``marginal gains'' approximate the marginal gains of $f$ within
+a multiplicative factor of $(1\pm\delta)$. The contributions are:
+\begin{itemize}
+	\item this notion of approximate submodularity is related to the already
+		existing notion of ``submodularity ratio'': there is a necessary condition on
+		$\delta$ (involving the submodularity ratio) for a function to be
+		$\delta$-approximate submodular.
+	\item an approximation bound on the value attained by the greedy algorithm
+		when maximizing a $\delta$-approximate submodular function is derived.
+	\item two functions are shown to be $\delta$-approximately submodular:
+		the trace of inverse Gramian matrix of observations (which is an
+		important quantity in A-optimal experiment design) and the minimum
+		eigenvalue of the Gramian.
+	\item an experiment shows that the approximation bound derived for
+		$\delta$-submodular functions compares favorably to the one by (Bian et
+		al., 2017) using the notions of submodularity ratio and curvature.
+	\item a second experiment shows how to use the notion of
+		$\delta$-submodularity to solve a problem of sensor selection.
+\end{itemize}
+
+
+\paragraph{General comments.} The general topic of approximate submodularity is
+becoming increasingly relevant to explain why the greedy heuristic often works
+well to maximize non-submodular functions. An important line of work has
+focused on the notion of submodularity ratio to measure the divergence from
+submodularity. The notion of $\delta$-approximate submodular notion introduced
+in this paper proposes an interesting new perspective on the same question.
+This definition is justified by the fact that the authors were able to identify
+two functions satisfying it. Furthermore this notion has the following
+advantage compared to the notion of submodularity ratio: once a submodular $g$
+function has been identified to approximate a given non-submodular function
+$f$, one can understand the approximation guarantee of the greedy algorithm
+simply by looking at properties of $g$, which is typically easier than
+computing the submodularity ratio (which takes exponential time). However both
+the theoretical results as well as the experimental results and the exposition
+in this paper suffer from inaccuracies and gaps that we now describe.
+
+\paragraph{ROS is not a characterization.} The authors claim that the Region of
+Submodularity (ROS) characterizes when a function is $\delta$-submodular.
+However, Lemma 1 only gives a \emph{necessary} condition on $\delta$ for
+a function to be $\delta$-submodular. A complete characterization would also
+require to show that for any $f$ and $\delta$ satisfying the bound of Lemma 1,
+there exists indeed a submodular function which $\delta$-approximates $f$; but
+this fact seems false to me or at least hard to prove. For this reason,
+I believe that the notion of region of submodularity region is not very
+relevant and is a complicated way to express that if a function is
+$\delta$-submodular, then it implies a lower bound on its submodularity ratio.
+This fact which is important and interesting but I do not believe that the ROS
+is the best way to express it (more on this below).
+
+\paragraph{The bound of Theorem 3 is not tighter than the bound of Theorem 2.}
+The authors claim at several points in their paper that the performance
+guarantee in Theorem 3 is tighter than the one in Theorem 2 and this is in fact
+the purpose of the experiment in section 5.1 to illustrate it. However,
+I believe that the bound in Theorem 3 is always worse than the one in Theorem
+2, this is made apparent by looking at the proof of both theorems.  The proofs
+are nearly identical, but Theorem 2 uses $\delta$-submodularity to obtain (1)
+a lower bound on the submodularity ratio (2) an upper bound on the curvature.
+Since the performance guarantee in Theorem 2 is increasing in the submodularity
+ratio and decreasing in the curvature, then the guarantee in Theorem 3 is
+implied by the one in Theorem 2 (see also my comments below on how to improve
+exposition). As a consequence, I am puzzled by the plot obtained in section 5.1
+since it seems to contradict the theory. Very few details are given on how the
+submodularity ratio and curvature were computed, but maybe they weren't
+computed properly?
+
+\paragraph{Computing the performance guarantee does not seem relevant.} The
+authors make the following good observation: it is much easier to compute the
+performance guarantee given by $\delta$-submodularity rather than to compute the
+parameters $\alpha$ and $\gamma$. So even if the performance guarantee is not
+as tight, once a $\delta$-approximation has been found, it provides
+a convenient way to compute a lower-bound on the approximation guarantee.
+However, being able to compute the approximation bounds does not seem relevant
+in practice: what truly matters is how well the greedy algorithm performs in
+practice, not what is the (worst case) lower bound provided by the theory.
+
+\paragraph{Experiment 5.2} It is not clear how $\delta$-approximate
+submodularity was used in Experiment 5.2: reading the text, it seems that the
+authors simply performed the task by using a greedy heuristic to maximize the
+minimum eigenvalue of the Gramian; this objective function is not submodular.
+As such, the experiment does not seem to illustrate a point relevant to this
+paper (but simply shows that using greedy selection on this non-submodular
+objective performs better than random).
+
+\paragraph{Suggestions to rewrite section 3.} Following some of the above
+comments, I believe that Section 3 could benefit a lot from being rewritten in
+the following way:
+\begin{enumerate}
+	\item restate Lemma 1 as follows: ``If $f$ is $\delta$-submodular, then
+		$\gamma_f\geq \frac{1-\delta}{1+\delta}$''. It is equivalent to, but
+		much clearer than the current statement. Also, as a more minor comment,
+		the proof of Lemma 1 does not need to be by contradiction and a direct
+		proof would be clearer:
+		\begin{displaymath}
+			1 \leq \min_{S,T}\frac{\sum_{t\in T\setminus S} g_S(t)}{g_S(T)}
+			\leq \frac{1+\delta}{1-\delta}\min_{S,T}\frac{\sum_{t\in T\setminus
+			S} f_S(t)}{f_S(T)}
+			= \gamma_f\cdot\frac{1+\delta}{1-\delta}
+		\end{displaymath}
+		where the first inequality is by submodularity and the rest is by
+		definition.
+	\item extract the second half of the proof of Theorem 3 in the appendix as
+		the following Lemma 2: ``let $f$ be monotone and $g$ be monotone
+		submodular such that $d(f, g)\leq \delta)$, then $\alpha_G(f)\leq
+		1 - \frac{1-\delta}{1+\delta}\big(1-\alpha_T(g)\big)$''.
+	\item present Theorem 2 with the simplified proof using the improved notion
+		of greedy curvature.
+	\item present Theorem 3 as a corollary of Theorem 2: by combining Lemma
+		1 and Lemma 2 from the first two points: if $d(f, g)\leq \delta$, then
+		  we have a lower bound on the submodularity ratio and an upper bound
+		  on the curvature than we can plug in Theorem 2.
+	\item remove all mentions of the ROS: instead the whole benefit of this
+		notion can be summarized by the following remark which can go below
+		Theorem 3: ``The bound of Theorem 3 shows that if the same function $f$
+		can be $\delta$-approximated by multiple submodular functions, then one
+		should favor the one with the smallest curvature''.
+\end{enumerate}
+
+\paragraph{Minor comments on the writing.} There are many statements of the
+form ``$f$ can be represented as/by a $\delta$-approximate submodular function'',
+but the definition of $\delta$-approximate submodular function alreadys
+contains the fact that $f$ is represented by a function, so I think it would be
+cleaner to say ``$f$ is $\delta$-approximate submodular'' (or make the
+representation explicit when needed and say ``let $g$ be a submodular function
+such that $d(f, g) \leq \delta$''). Furthermore I noticed a non-trivial amount
+of grammatical mistakes as I was reading and I would suggest doing more
+proofreading passes.
+
+\paragraph{Conclusion.} While I believe that the notion of
+$\delta$-submodularity is an interesting complement to the already existing
+notion of submodularity ratio, I think the paper would benefit a lot from
+a significant rework: both a rewrite of the Section 3 as suggested above, but
+also of the experiment section to fix the afore-mentioned problems. I also
+believe that the contributions of this paper, while valuable are a bit too
+incremental (in particular, compared to Bian et al, 2017) to justify
+a publication in JMLR: the suggested rewrite of Section 3 makes it apparent
+that Theorem 3 is a corollary of Theorem 2 and does not provide a tighter
+bound.
+\end{document}