summaryrefslogtreecommitdiffstats
path: root/ijcai-2021-366.txt
diff options
context:
space:
mode:
Diffstat (limited to 'ijcai-2021-366.txt')
-rw-r--r--ijcai-2021-366.txt54
1 files changed, 54 insertions, 0 deletions
diff --git a/ijcai-2021-366.txt b/ijcai-2021-366.txt
new file mode 100644
index 0000000..f0be2e8
--- /dev/null
+++ b/ijcai-2021-366.txt
@@ -0,0 +1,54 @@
+This paper proposes a general stochastic first-order method for the
+optimization of a smooth non-convex function over a Riemannian manifold.
+
+While SGD is known to be minimax optimal in general (requiring O(eps^-4)
+stochastic gradient queries to find an esp-approximate stationary point), it
+has been shown that variants of SGD can achieve O(eps^-3) sample complexity
+when the objective function is required to satisfy a stronger, mean-squared
+smoothness assumption. Previous adaptations of these methods to the Riemannian
+case were either specific to certain manifolds or required large batch gradient
+updates at certain iterations, leading to prohibitive computation costs.
+
+The method introduced in this paper can be described as the Riemannian
+counterpart of the Stochastic Recursive Momentum method of [Cutkosky et al.,
+2019]. The main idea is to interpolate smoothly, over the iterations of the
+algorithm, between vanilla Riemannian SGD and the momentum-based RSPRG
+algorithm of [Han et al., 2020; Zhou et al., 2019].
+
+The main theoretical result is that with a proper decaying schedule for the
+step size and interpolation parameter (controlling how much of the momentum
+term to mix in to the gradient update), the proposed algorithm converges with
+a nearly-optimal rate (up to logarithmic terms) and only requires one random
+sample and two gradient computations per iterations. The result holds for
+arbitrary Riemannian manifolds and only requires tuning two hyperparameters.
+
+An experimental section evaluates the proposed algorithm for three distinct
+tasks: PCA (on the Grassmann manifold), ICA (on the Stiefel manifold) and
+Riemannian Centroid, and shows that it consistently outperforms other
+Riemannian stochastic first-order methods.
+
+Detailed comments
+=================
+
+The problem considered in this paper is relevant since many tasks in data
+analysis are naturally constrained to a Riemannian manifold (for example PCA or
+ICA considered in the experimental section). Furthermore it is satisfying both
+from a theoretical perspective and applicability perspective to have a general
+method which can be analyzed for arbitrary manifolds.
+
+The convergence result is strong and the proofs are correct as far as I was
+able to check. A big part of the derivation follows ideas similar to the
+Euclidean case which I was able to check, but not being an expert in Riemannian
+geometry, I could have missed a technical mistake specific to the Riemannian
+case.
+
+The paper is well-written (as far as the 7-page limit permits…) and I found the
+experimental evaluations very convincing.
+
+I have a minor concern about the significance of the result because of the
+extra assumption of mean-squared Lipschitzness of the gradients. It was not
+clear to me how restrictive this assumption is and how often it is found in
+practice. In particular, I was surprised not to see any mention of it in the
+experimental section: do the examples considered there satisfy the assumption?
+I think it would improve the paper to say a bit more about this.
+