summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Makefile2
-rw-r--r--main.bib9
-rw-r--r--siopt-2021-140246.tex133
3 files changed, 143 insertions, 1 deletions
diff --git a/Makefile b/Makefile
index 3082316..5a43db1 100644
--- a/Makefile
+++ b/Makefile
@@ -4,7 +4,7 @@ BIB = refs.bib
.PHONY: all clean FORCE
-all: jmlr-20-208.pdf
+all: siopt-2021-140246.pdf
%.pdf: FORCE
latexrun -W no-xcolor $*.tex
diff --git a/main.bib b/main.bib
new file mode 100644
index 0000000..8cf65cb
--- /dev/null
+++ b/main.bib
@@ -0,0 +1,9 @@
+@article{KR65,
+ title={A general theorem on selectors},
+ author={Kuratowski, Kazimierz and Ryll-Nardzewski, Czes{\l}aw},
+ journal={Bull. Acad. Polon. Sci. S{\'e}r. Sci. Math. Astronom. Phys},
+ volume={13},
+ number={6},
+ pages={397--403},
+ year={1965}
+}
diff --git a/siopt-2021-140246.tex b/siopt-2021-140246.tex
new file mode 100644
index 0000000..d6cdb8c
--- /dev/null
+++ b/siopt-2021-140246.tex
@@ -0,0 +1,133 @@
+\documentclass[10pt]{article}
+\usepackage[T1]{fontenc}
+\usepackage[utf8]{inputenc}
+\usepackage[hmargin=1in, vmargin=1in]{geometry}
+\usepackage{amsmath,amsfonts}
+
+\title{\large Review of \emph{The stochastic Auxiliary Problem Principle in Banach Spaces}}
+\date{}
+
+\begin{document}
+
+\maketitle
+
+\paragraph{Summary.} This paper considers the Stochastic Auxiliary Problem
+Principle (stochastic APP) algorithm, a general stochastic approximation scheme
+introduced by Cuholi and Cohen in 1990 and generalizing the Stochastic Proximal
+Gradient method as well as the Stochastic Mirror Descent method. While this
+method was originally introduced for infinite-dimensional Hilbert spaces, the
+main goal of this paper is to analyze its properties when the iterates lie in
+a Banach space. The paper focuses on the case where the objective function
+decomposes as the sum of a subdifferentiable lower semicontinuous (l.s.c.)
+funcion and a (possibly non-differentiable) l.s.c.\ function (typically,
+a regularizer in learning applications). Importantly, random noise in the
+evaluation of the subdifferential of the first of these two functions is
+considered. The main contributions are:
+\begin{itemize}
+ \item a proof of the measurability of the iterates of the Stochastic APP
+ method in the case of a reflexive separable Banach space.
+ \item a proof of convergence of the stochastic APP algorithm, both in
+ function value and for the iterates themselves.
+ \item an analysis of the rate of convergence of the APP algorithm (in
+ function value). Both average iterates as well as the final iterate are
+ considered.
+\end{itemize}
+
+\paragraph{General evaluation.} This paper considers an important and general
+stochastic approximation method, which encompasses many well-known stochastic
+approximation algorithms. As such, analyzing the properties of this procedure
+in as general a setting as possible is very relevant to the optimization
+community at large. Thus, this paper makes a valuable contribution by
+presenting a general analysis which not only extends previous results from
+Hilbert spaces to Banach spaces, but also considers the case where random noise
+is added to subdifferential estimates (previously, either no noise was
+considered for the case of Hilbert spaces, or noise was considered but only for
+the specific case of projected stochastic gradient descent). Furthermore, the
+discussion of the measurability of the iterates is important and, as noted by
+the authors, crucially missing in most if not all prior work. Finally, the
+paper is very well-written and a pleasure to read.
+
+\paragraph{Major comments.}
+\begin{enumerate}
+ \item My main concern with this paper is with respect to the generality of
+ the setting. While the abstract and introduction talk about a general
+ Banach space, the technical parts of the paper only consider
+ a separable and \emph{reflexive} Banach space, which is conceptually
+ much closer to a Hilbert space. In fact, it becomes clear when reading
+ the proofs of convergence that most of the ideas from the Hilbert case
+ can be directly applied here as well. Because of this, the more
+ important contributions of the paper are (i) the discussion of the
+ measurability of iterates (ii) allowing random noise in subdifferential
+ evaluations and not as much the generalization from Hilbert spaces to
+ Banach spaces. I still believe theses are important contributions, but
+ the reflexivity assumption should be mentioned earlier (possibly
+ already in the abstract) in order not to mislead the reader into
+ a false sense of generality. If at the technical level, non-trivial
+ adaptations were required to accommodate the case of Banach spaces,
+ those should be emphasized as well.
+ \item Related to the previous point, it might be helpful in Section 2.2 to
+ give examples where previous work has considered stochastic
+ approximation methods for reflexive Banach spaces (but which were not
+ Hilbert spaces). Examples I have in mind quite often consider
+ optimization problems over $L^1$ spaces, which are not reflexive, and
+ for which the results of this paper would not apply.
+ \item I believe the presentation of section 3.1 could be streamlined, since
+ a significant part of amounts to presenting previously known result
+ from the theory of normal integrands and measurable selections. These
+ could be moved to the appendix, while only keeping in the main part of
+ the paper the results specific to this paper and not directly taken
+ from previous work. I believe this would be Proposition
+ 3.17 and Proposition 3.18 as well as Corollary 3.19. This is a matter
+ of taste though, and I will leave it to the editor to decide what is
+ preferable and bests matches the style of the journal.
+ \item Related to the previous point, since one goal of this paper is
+ generality, it might be worth mentioning that all the results from
+ section 3.1.1 are also valid in Polish spaces (separable and completely
+ metrizable space), that is, normability is not necessary.
+ \item Was measurability the iterates ever discussed in prior work or known
+ in some specific cases? Maybe in the finite dimensional case where it
+ is simpler to establish? If so this should be mentioned somewhere in
+ section 3.
+ \item A discussion of the assumptions introduced at the beginning of
+ section 4.1 could be valuable. Which one seem strictly necessary, which
+ one could maybe be relaxed with more work? Are they commonly assumed in
+ similar work? This seems particularly relevant for assumption A12 which
+ if I am not mistaken is not always present in similar work (the
+ remaining assumptions seem more standard to me).
+\end{enumerate}
+
+\paragraph{Minor comments.}
+\begin{enumerate}
+ \item Mention early on in the introduction that $j^C$ and $j^\Sigma$ are
+ functions from $\mathbb{U}\times\mathbb{W}$ to $\mathbb{R}$. This
+ becomes clear later but might be slightly confusing at the beginning.
+ \item When first introducing the Gateaux differentiability assumption,
+ explain which variant of the definition is used (depending on the
+ authors, linearity and boundedness of the directional derivative is not
+ required, but it is here).
+ \item Second paragraph of section 3.1: when introducing the Borel
+ $\sigma$-field of $\mathbb{U}^\star$, explain which topology on
+ $\mathbb{U}^\star$ is considered (it becomes clear later that it is the
+ topology of the norm induced by the primal norm).
+ \item It might be worth mentioning that Prop. 3.5 is also commonly
+ known as the Kuratowski–Ryll-Nardzewski selection theorem \cite{KR65}.
+ I also believe that it should be explained below Proposition 3.5 or
+ even stated as a corollary, that an important consequence of this
+ proposition is that if $\Gamma$ is Effros-measurable then it admits
+ a measurable selection. This might be missed at first reading and is
+ the way this proposition is truly used in the paper.
+ \item In the proof of Lemma 4.1, a better explanation is needed for
+ inequality 4.3. I am convinced it is true but was not immediately able
+ to find a clean justification since the objective function is a sum of
+ differentiable function and non-differentiable function.
+\end{enumerate}
+
+\paragraph{Conclusion.} This paper elegantly presents a unified and general
+analysis of an important stochastic approximation procedure, while filling
+theoretical gaps left open by previous work. I believe it can be a valuable
+contribution to SIOPT after the above comments are taken into account.
+
+\bibliographystyle{plain}
+\bibliography{main}
+
+\end{document}