diff options
Diffstat (limited to 'ijcai-2021-366.txt')
| -rw-r--r-- | ijcai-2021-366.txt | 54 |
1 files changed, 54 insertions, 0 deletions
diff --git a/ijcai-2021-366.txt b/ijcai-2021-366.txt new file mode 100644 index 0000000..f0be2e8 --- /dev/null +++ b/ijcai-2021-366.txt @@ -0,0 +1,54 @@ +This paper proposes a general stochastic first-order method for the +optimization of a smooth non-convex function over a Riemannian manifold. + +While SGD is known to be minimax optimal in general (requiring O(eps^-4) +stochastic gradient queries to find an esp-approximate stationary point), it +has been shown that variants of SGD can achieve O(eps^-3) sample complexity +when the objective function is required to satisfy a stronger, mean-squared +smoothness assumption. Previous adaptations of these methods to the Riemannian +case were either specific to certain manifolds or required large batch gradient +updates at certain iterations, leading to prohibitive computation costs. + +The method introduced in this paper can be described as the Riemannian +counterpart of the Stochastic Recursive Momentum method of [Cutkosky et al., +2019]. The main idea is to interpolate smoothly, over the iterations of the +algorithm, between vanilla Riemannian SGD and the momentum-based RSPRG +algorithm of [Han et al., 2020; Zhou et al., 2019]. + +The main theoretical result is that with a proper decaying schedule for the +step size and interpolation parameter (controlling how much of the momentum +term to mix in to the gradient update), the proposed algorithm converges with +a nearly-optimal rate (up to logarithmic terms) and only requires one random +sample and two gradient computations per iterations. The result holds for +arbitrary Riemannian manifolds and only requires tuning two hyperparameters. + +An experimental section evaluates the proposed algorithm for three distinct +tasks: PCA (on the Grassmann manifold), ICA (on the Stiefel manifold) and +Riemannian Centroid, and shows that it consistently outperforms other +Riemannian stochastic first-order methods. + +Detailed comments +================= + +The problem considered in this paper is relevant since many tasks in data +analysis are naturally constrained to a Riemannian manifold (for example PCA or +ICA considered in the experimental section). Furthermore it is satisfying both +from a theoretical perspective and applicability perspective to have a general +method which can be analyzed for arbitrary manifolds. + +The convergence result is strong and the proofs are correct as far as I was +able to check. A big part of the derivation follows ideas similar to the +Euclidean case which I was able to check, but not being an expert in Riemannian +geometry, I could have missed a technical mistake specific to the Riemannian +case. + +The paper is well-written (as far as the 7-page limit permits…) and I found the +experimental evaluations very convincing. + +I have a minor concern about the significance of the result because of the +extra assumption of mean-squared Lipschitzness of the gradients. It was not +clear to me how restrictive this assumption is and how often it is found in +practice. In particular, I was surprised not to see any mention of it in the +experimental section: do the examples considered there satisfy the assumption? +I think it would improve the paper to say a bit more about this. + |
