1 files changed, 133 insertions, 0 deletions
diff --git a/siopt-2021-140246.tex b/siopt-2021-140246.tex
new file mode 100644
index 0000000..d6cdb8c
--- /dev/null
+++ b/siopt-2021-140246.tex
@@ -0,0 +1,133 @@
+\documentclass[10pt]{article}
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[hmargin=1in, vmargin=1in]{geometry}
+\usepackage{amsmath,amsfonts}
+
+\title{\large Review of \emph{The stochastic Auxiliary Problem Principle in Banach Spaces}}
+\date{}
+
+\begin{document}
+
+\maketitle
+
+\paragraph{Summary.} This paper considers the Stochastic Auxiliary Problem
+Principle (stochastic APP) algorithm, a general stochastic approximation scheme
+introduced by Cuholi and Cohen in 1990 and generalizing the Stochastic Proximal
+Gradient method as well as the Stochastic Mirror Descent method. While this
+method was originally introduced for infinite-dimensional Hilbert spaces, the
+main goal of this paper is to analyze its properties when the iterates lie in
+a Banach space. The paper focuses on the case where the objective function
+decomposes as the sum of a subdifferentiable lower semicontinuous (l.s.c.)
+funcion and a (possibly non-differentiable) l.s.c.\ function (typically,
+a regularizer in learning applications). Importantly, random noise in the
+evaluation of the subdifferential of the first of these two functions is
+considered. The main contributions are:
+\begin{itemize}
+	\item a proof of the measurability of the iterates of the Stochastic APP
+		method in the case of a reflexive separable Banach space.
+	\item a proof of convergence of the stochastic APP algorithm, both in
+		function value and for the iterates themselves.
+	\item an analysis of the rate of convergence of the APP algorithm (in
+		function value). Both average iterates as well as the final iterate are
+		considered.
+\end{itemize}
+
+\paragraph{General evaluation.} This paper considers an important and general
+stochastic approximation method, which encompasses many well-known stochastic
+approximation algorithms. As such, analyzing the properties of this procedure
+in as general a setting as possible is very relevant to the optimization
+community at large. Thus, this paper makes a valuable contribution by
+presenting a general analysis which not only extends previous results from
+Hilbert spaces to Banach spaces, but also considers the case where random noise
+is added to subdifferential estimates (previously, either no noise was
+considered for the case of Hilbert spaces, or noise was considered but only for
+the specific case of projected stochastic gradient descent). Furthermore, the
+discussion of the measurability of the iterates is important and, as noted by
+the authors, crucially missing in most if not all prior work. Finally, the
+paper is very well-written and a pleasure to read.
+
+\paragraph{Major comments.}
+\begin{enumerate}
+	\item My main concern with this paper is with respect to the generality of
+		the setting. While the abstract and introduction talk about a general
+		Banach space, the technical parts of the paper only consider
+		a separable and \emph{reflexive} Banach space, which is conceptually
+		much closer to a Hilbert space. In fact, it becomes clear when reading
+		the proofs of convergence that most of the ideas from the Hilbert case
+		can be directly applied here as well. Because of this, the more
+		important contributions of the paper are (i) the discussion of the
+		measurability of iterates (ii) allowing random noise in subdifferential
+		evaluations and not as much the generalization from Hilbert spaces to
+		Banach spaces. I still believe theses are important contributions, but
+		the reflexivity assumption should be mentioned earlier (possibly
+		already in the abstract) in order not to mislead the reader into
+		a false sense of generality. If at the technical level, non-trivial
+		adaptations were required to accommodate the case of Banach spaces,
+		those should be emphasized as well.
+	\item Related to the previous point, it might be helpful in Section 2.2 to
+		give examples where previous work has considered stochastic
+		approximation methods for reflexive Banach spaces (but which were not
+		Hilbert spaces). Examples I have in mind quite often consider
+		optimization problems over $L^1$ spaces, which are not reflexive, and
+		for which the results of this paper would not apply.
+	\item I believe the presentation of section 3.1 could be streamlined, since
+		a significant part of amounts to presenting previously known result
+		from the theory of normal integrands and measurable selections. These
+		could be moved to the appendix, while only keeping in the main part of
+		the paper the results specific to this paper and not directly taken
+		from previous work. I believe this would be Proposition
+		3.17 and Proposition 3.18 as well as Corollary 3.19. This is a matter
+		  of taste though, and I will leave it to the editor to decide what is
+		  preferable and bests matches the style of the journal.
+	\item Related to the previous point, since one goal of this paper is
+		generality, it might be worth mentioning that all the results from
+		section 3.1.1 are also valid in Polish spaces (separable and completely
+		metrizable space), that is, normability is not necessary.
+	\item Was measurability the iterates ever discussed in prior work or known
+		in some specific cases? Maybe in the finite dimensional case where it
+		is simpler to establish? If so this should be mentioned somewhere in
+		section 3.
+	\item A discussion of the assumptions introduced at the beginning of
+		section 4.1 could be valuable. Which one seem strictly necessary, which
+		one could maybe be relaxed with more work? Are they commonly assumed in
+		similar work? This seems particularly relevant for assumption A12 which
+		if I am not mistaken is not always present in similar work (the
+		remaining assumptions seem more standard to me).
+\end{enumerate}
+
+\paragraph{Minor comments.} 
+\begin{enumerate}
+	\item Mention early on in the introduction that $j^C$ and $j^\Sigma$ are
+		functions from $\mathbb{U}\times\mathbb{W}$ to $\mathbb{R}$. This
+		becomes clear later but might be slightly confusing at the beginning.
+	\item When first introducing the Gateaux differentiability assumption,
+		explain which variant of the definition is used (depending on the
+		authors, linearity and boundedness of the directional derivative is not
+		required, but it is here).
+	\item Second paragraph of section 3.1: when introducing the Borel
+		$\sigma$-field of $\mathbb{U}^\star$, explain which topology on
+		$\mathbb{U}^\star$ is considered (it becomes clear later that it is the
+		topology of the norm induced by the primal norm).
+	\item It might be worth mentioning that Prop. 3.5 is also commonly
+		known as the Kuratowski–Ryll-Nardzewski selection theorem \cite{KR65}.
+		I also believe that it should be explained below Proposition 3.5 or
+		even stated as a corollary, that an important consequence of this
+		proposition is that if $\Gamma$ is Effros-measurable then it admits
+		a measurable selection. This might be missed at first reading and is
+		the way this proposition is truly used in the paper.
+	\item In the proof of Lemma 4.1, a better explanation is needed for
+		inequality 4.3. I am convinced it is true but was not immediately able
+		to find a clean justification since the objective function is a sum of
+		differentiable function and non-differentiable function.
+\end{enumerate}
+
+\paragraph{Conclusion.} This paper elegantly presents a unified and general
+analysis of an important stochastic approximation procedure, while filling
+theoretical gaps left open by previous work. I believe it can be a valuable
+contribution to SIOPT after the above comments are taken into account.
+
+\bibliographystyle{plain}
+\bibliography{main}
+
+\end{document}