1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
|
\documentclass[12pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[hmargin=1.2in, vmargin=1.2in]{geometry}
\title{Review of \emph{Proximal Distance Algorithms: Theory and Practice}}
\date{}
\begin{document}
\maketitle
\vspace{-3em}
\paragraph{Summary of the results.} This paper introduces a new class of
algorithms called \emph{proximal distance algorithms} to find the minimum of an
objective function $f$ in a constrained set $\mathcal{C}$. The main idea is to
combine the penalty method with the majorization-minimization (MM) principle,
specifically:
\begin{itemize}
\item the algorithm solves a sequence of unconstrained minimization
problems with objective $f(x) + \rho q(x)$ where $q$ is a penalty
function which measures the amount of violation of the constraints and
where the parameter $\rho$ is chosen according to a sequence increasing
towards $\infty$.
\item the proposed penalty function is the squared distance: ${\rm dist}(x,
\mathcal{C})^2$ or more precisely, if $\mathcal{C} = \bigcap_{i=1}^m
C_i$ is the intersection of closed sets, $q(x)
= \frac{1}{m}\sum_{i=1}^m \|x-P_{C_i}(x)\|^2$ where $P_{C_i}$ denotes
the (possibly multi-valued) projection of $x$ onto the set $C_i$.
\item the optimization problem for fixed $\rho$ is solved by using the MM
principle by defining $x_{k+1}$ at step $k+1$ to minimize the surrogate
function:
\begin{displaymath}
g(x|x_k) = f(x) + \frac{\rho}{2m}\sum_{i=1}^m \|x-P_{C_i}(x_k)\|^2
\end{displaymath}
\end{itemize}
The authors observe that minimizing the surrogate upper-bound on the
objective function amounts to computing the proximal operator of
$\rho^{-1}f$ at the point $y_k = \frac{1}{m}\sum_{i=1}^m P_{C_i}(x_k)$
thus connecting this algorithm to the field of proximal algorithms
which leads to a simple analysis of the convergence of the proposed
algorithm for fixed $\rho > 0$. An accelerated variant of the algorithm
is also proposed. The authors then show how to instantiate their
algorithms on seven different examples and provide experimental evidence
that the algorithms compare favorably to traditional general purpose
convex optimization algorithms.
\paragraph{Comments on the proposed algorithm.} The proposed algorithms follow
from simple and general principles leading to intuitive-looking procedures,
which is a positive point. The interpretation of the optimization for fixed
$\rho > 0$ via proximal operators is interesting since it allows the proposed
algorithms to leverage what is already known about computing proximal operators
and provides a proof of convergence of the procedure for $\rho > 0$ essentially
for free.
A complaint about the proposed algorithms: the paper departs from the
traditional way of writing the constraints as a set of inequalities $f_i(x)\leq
0$ or equalities $f_j(x) = 0$ (typically for convex functions $f_i$ or $f_j$)
and instead writes them as $x\in \cap_{i=1}^m C_i$ for (typically convex) sets
$C_i$. This allows the authors to use $\|x-P_{C_i}(x)\|^2$ as the penalty
function as opposed to $\max (0, f_i(x))^q$ for some exponent $q$ as is usual
in traditional applications of the penalty method. While very convenient at an
abstract mathematical level, the notation $P_{C_i}(x)$ and this choice of
penalty function hides the implicit assumption that one has access to
a representation of the convex sets $C_i$ defining the constraints in a way
which allows for easy computation of $P_{C_i}(x)$. While the examples show that
this is not necessarily a problem, this implicit assumption should be
highlighted and commented on: if the optimization problem we are facing is
written with constraints $f_i(x) \leq 0$ and only ``black-box'' access to $f_i$
is provided, then there is no efficient way to represent the associated convex
set $C_i$ and to project onto it. While mathematically equivalent, the choice
of representation of the constraints matter.
\paragraph{Absence of theoretical results.} The only theoretical result in this
paper is the proof of convergence of the procedure to solve the penalized
problem for fixed $\rho>0$ via the distance majorization principle. This follows
from the interpretation of the procedure as a proximal algorithm and a direct
application of the general convergence theorem of Krasnosel'skii and Mann. This
still leaves many gaps in the theoretical understanding of the proposed
algorithms:
\begin{itemize}
\item importantly, the theoretical discussion does not give any indication
of the \textbf{rate of convergence} of the procedure and how it
compares to other traditional algorithms in convex optimization.
\item there is no proof of convergence of the overall procedure where the
parameter $\rho$ is chosen according to a diverging sequence. There is
only a sketch of the proof at the end of section 3 assuming that both
the objective function is coercive (without any comment on whether the
objective function at hand satisfies this assumption)
\item the overall procedure assumes that the inner optimization problem for
fixed $\rho$ is solved exactly, which is not the case in practice. An
analysis of this nested algorithm should account for approximate
solving of the problem for fixed $\rho > 0$ and how approximation
errors accumulate when $\rho$ changes. In fact, the details of the
definition of the overall procedure (how to choose the sequence $\rho$,
the fact that we use the solution $x_{\rho_n}$ of the problem for
$\rho_n$ as the initial solution for the problem with parameter
$\rho_{n+1}$) are not provided and left as guess work to the
reader (cf. my comment on the writing below).
\item the accelerated variant of the algorithm comes without
convergence guarantee.
\end{itemize}
\paragraph{Comments on the experiments.} The chosen examples are interesting
and cover a wide range of applications and demonstrate that the proposed
algorithms can be instantiated successfully in all these applications. The fact
that the details of the instantiations (how to compute the projections, etc.)
are worked out in details is valuable.
A surprising and regrettable fact is that the algorithm is compared against
other algorithms only via computation time (CPU time) which is a very indirect
measure of efficiency of algorithms since it is highly dependent on
implementation choices such as the programming language. A more usual and
better (or at least complementary) way to make the comparison would be to show
plots of the approximation errors on the y-axis as the number of iterations
increases on the x-axis. Without this, it is really hard to compare algorithms
to each other in terms of rate of convergence, especially given the absence of
theoretical guarantees on the rate of convergence (cf. my comment above on the
absence of theoretical results).
\paragraph{Comments on the writing.} Given that the paper aims at expanding on
an already published paper on the same topic, the exposition should be
particularly polished to provide a good overview of this class of algorithms.
Unfortunately, the current exposition suffers from important lapses:
\begin{itemize}
\item as already mentioned above, the overall procedure is never described
in full details in one place but is instead presented in multiple
pieces throughout the first three sections. The fact that the overall
procedure is in fact a nested procedure where a different optimization
problem is solved at each step could easily be overlooked. In fact, the
details of this nested procedure are never provided (choice of the
diverging sequence for $\rho$, how the different problems for different
values of $\rho$ are combined) are never given in the theoretical
section and only briefly commented on in the experimental section. The
paper would greatly benefit from a complete specification of the
algorithm, for example in a dedicated framed box, or ``algorithm''
environment.
\item sections 1, 2 and 3 are not well-structured, with an uninterrupted
flow of theoretical derivations, examples, proofs and proof sketches
without separators. The exposition would benefit from clearly
indicating the examples (using a separate environment for example). The
theoretical guarantee(s) in section three should be clearly stated in
a separate theorem or proposition environment, followed by a proof
where the beginning and the end can be clearly identified. The
accelerated variant could be presented in a separate section, possibly
also in a dedicated framed box or dedicated algorithm environment.
\end{itemize}
A few additional minor comments on the writing:
\begin{itemize}
\item the notation $P_C$ should be clearly defined at the point it is
introduced (notably, the fact that it is in general multi-valued,
a point which only becomes clear later in the paper).
\item in the introduction, the sentence ``fills in many gaps of theory and
practice'' should be rephrased: it is not clear which gaps are filled
by the current exposition.
\item the writing suffers at times from the use of subjective words, such as
``painfully slow'' at the bottom of page 5 which would be better
replaced by a precise quantitative statement. Similarly, the first
sentence of the introduction ``part science and part art'', while
arguably acceptable at the beginning of an introduction, would benefit
from some additional justification or a discussion of what is actually
meant.
\end{itemize}
\paragraph{Conclusion.} While the proposed algorithms are interesting and the
connection to proximal algorithms and the fact that they follow from general
principles is appealing, the paper greatly suffers from many lapses in the
theoretical analysis, the exposition and the experimental comparison to other
methods
\end{document}
|