diff options
Diffstat (limited to 'siopt-2021-140246.tex')
| -rw-r--r-- | siopt-2021-140246.tex | 133 |
1 files changed, 133 insertions, 0 deletions
diff --git a/siopt-2021-140246.tex b/siopt-2021-140246.tex new file mode 100644 index 0000000..d6cdb8c --- /dev/null +++ b/siopt-2021-140246.tex @@ -0,0 +1,133 @@ +\documentclass[10pt]{article} +\usepackage[T1]{fontenc} +\usepackage[utf8]{inputenc} +\usepackage[hmargin=1in, vmargin=1in]{geometry} +\usepackage{amsmath,amsfonts} + +\title{\large Review of \emph{The stochastic Auxiliary Problem Principle in Banach Spaces}} +\date{} + +\begin{document} + +\maketitle + +\paragraph{Summary.} This paper considers the Stochastic Auxiliary Problem +Principle (stochastic APP) algorithm, a general stochastic approximation scheme +introduced by Cuholi and Cohen in 1990 and generalizing the Stochastic Proximal +Gradient method as well as the Stochastic Mirror Descent method. While this +method was originally introduced for infinite-dimensional Hilbert spaces, the +main goal of this paper is to analyze its properties when the iterates lie in +a Banach space. The paper focuses on the case where the objective function +decomposes as the sum of a subdifferentiable lower semicontinuous (l.s.c.) +funcion and a (possibly non-differentiable) l.s.c.\ function (typically, +a regularizer in learning applications). Importantly, random noise in the +evaluation of the subdifferential of the first of these two functions is +considered. The main contributions are: +\begin{itemize} + \item a proof of the measurability of the iterates of the Stochastic APP + method in the case of a reflexive separable Banach space. + \item a proof of convergence of the stochastic APP algorithm, both in + function value and for the iterates themselves. + \item an analysis of the rate of convergence of the APP algorithm (in + function value). Both average iterates as well as the final iterate are + considered. +\end{itemize} + +\paragraph{General evaluation.} This paper considers an important and general +stochastic approximation method, which encompasses many well-known stochastic +approximation algorithms. As such, analyzing the properties of this procedure +in as general a setting as possible is very relevant to the optimization +community at large. Thus, this paper makes a valuable contribution by +presenting a general analysis which not only extends previous results from +Hilbert spaces to Banach spaces, but also considers the case where random noise +is added to subdifferential estimates (previously, either no noise was +considered for the case of Hilbert spaces, or noise was considered but only for +the specific case of projected stochastic gradient descent). Furthermore, the +discussion of the measurability of the iterates is important and, as noted by +the authors, crucially missing in most if not all prior work. Finally, the +paper is very well-written and a pleasure to read. + +\paragraph{Major comments.} +\begin{enumerate} + \item My main concern with this paper is with respect to the generality of + the setting. While the abstract and introduction talk about a general + Banach space, the technical parts of the paper only consider + a separable and \emph{reflexive} Banach space, which is conceptually + much closer to a Hilbert space. In fact, it becomes clear when reading + the proofs of convergence that most of the ideas from the Hilbert case + can be directly applied here as well. Because of this, the more + important contributions of the paper are (i) the discussion of the + measurability of iterates (ii) allowing random noise in subdifferential + evaluations and not as much the generalization from Hilbert spaces to + Banach spaces. I still believe theses are important contributions, but + the reflexivity assumption should be mentioned earlier (possibly + already in the abstract) in order not to mislead the reader into + a false sense of generality. If at the technical level, non-trivial + adaptations were required to accommodate the case of Banach spaces, + those should be emphasized as well. + \item Related to the previous point, it might be helpful in Section 2.2 to + give examples where previous work has considered stochastic + approximation methods for reflexive Banach spaces (but which were not + Hilbert spaces). Examples I have in mind quite often consider + optimization problems over $L^1$ spaces, which are not reflexive, and + for which the results of this paper would not apply. + \item I believe the presentation of section 3.1 could be streamlined, since + a significant part of amounts to presenting previously known result + from the theory of normal integrands and measurable selections. These + could be moved to the appendix, while only keeping in the main part of + the paper the results specific to this paper and not directly taken + from previous work. I believe this would be Proposition + 3.17 and Proposition 3.18 as well as Corollary 3.19. This is a matter + of taste though, and I will leave it to the editor to decide what is + preferable and bests matches the style of the journal. + \item Related to the previous point, since one goal of this paper is + generality, it might be worth mentioning that all the results from + section 3.1.1 are also valid in Polish spaces (separable and completely + metrizable space), that is, normability is not necessary. + \item Was measurability the iterates ever discussed in prior work or known + in some specific cases? Maybe in the finite dimensional case where it + is simpler to establish? If so this should be mentioned somewhere in + section 3. + \item A discussion of the assumptions introduced at the beginning of + section 4.1 could be valuable. Which one seem strictly necessary, which + one could maybe be relaxed with more work? Are they commonly assumed in + similar work? This seems particularly relevant for assumption A12 which + if I am not mistaken is not always present in similar work (the + remaining assumptions seem more standard to me). +\end{enumerate} + +\paragraph{Minor comments.} +\begin{enumerate} + \item Mention early on in the introduction that $j^C$ and $j^\Sigma$ are + functions from $\mathbb{U}\times\mathbb{W}$ to $\mathbb{R}$. This + becomes clear later but might be slightly confusing at the beginning. + \item When first introducing the Gateaux differentiability assumption, + explain which variant of the definition is used (depending on the + authors, linearity and boundedness of the directional derivative is not + required, but it is here). + \item Second paragraph of section 3.1: when introducing the Borel + $\sigma$-field of $\mathbb{U}^\star$, explain which topology on + $\mathbb{U}^\star$ is considered (it becomes clear later that it is the + topology of the norm induced by the primal norm). + \item It might be worth mentioning that Prop. 3.5 is also commonly + known as the Kuratowski–Ryll-Nardzewski selection theorem \cite{KR65}. + I also believe that it should be explained below Proposition 3.5 or + even stated as a corollary, that an important consequence of this + proposition is that if $\Gamma$ is Effros-measurable then it admits + a measurable selection. This might be missed at first reading and is + the way this proposition is truly used in the paper. + \item In the proof of Lemma 4.1, a better explanation is needed for + inequality 4.3. I am convinced it is true but was not immediately able + to find a clean justification since the objective function is a sum of + differentiable function and non-differentiable function. +\end{enumerate} + +\paragraph{Conclusion.} This paper elegantly presents a unified and general +analysis of an important stochastic approximation procedure, while filling +theoretical gaps left open by previous work. I believe it can be a valuable +contribution to SIOPT after the above comments are taken into account. + +\bibliographystyle{plain} +\bibliography{main} + +\end{document} |
