This paper proposes a general stochastic first-order method for the optimization of a smooth non-convex function over a Riemannian manifold. While SGD is known to be minimax optimal in general (requiring O(eps^-4) stochastic gradient queries to find an esp-approximate stationary point), it has been shown that variants of SGD can achieve O(eps^-3) sample complexity when the objective function is required to satisfy a stronger, mean-squared smoothness assumption. Previous adaptations of these methods to the Riemannian case were either specific to certain manifolds or required large batch gradient updates at certain iterations, leading to prohibitive computation costs. The method introduced in this paper can be described as the Riemannian counterpart of the Stochastic Recursive Momentum method of [Cutkosky et al., 2019]. The main idea is to interpolate smoothly, over the iterations of the algorithm, between vanilla Riemannian SGD and the momentum-based RSPRG algorithm of [Han et al., 2020; Zhou et al., 2019]. The main theoretical result is that with a proper decaying schedule for the step size and interpolation parameter (controlling how much of the momentum term to mix in to the gradient update), the proposed algorithm converges with a nearly-optimal rate (up to logarithmic terms) and only requires one random sample and two gradient computations per iterations. The result holds for arbitrary Riemannian manifolds and only requires tuning two hyperparameters. An experimental section evaluates the proposed algorithm for three distinct tasks: PCA (on the Grassmann manifold), ICA (on the Stiefel manifold) and Riemannian Centroid, and shows that it consistently outperforms other Riemannian stochastic first-order methods. Detailed comments ================= The problem considered in this paper is relevant since many tasks in data analysis are naturally constrained to a Riemannian manifold (for example PCA or ICA considered in the experimental section). Furthermore it is satisfying both from a theoretical perspective and applicability perspective to have a general method which can be analyzed for arbitrary manifolds. The convergence result is strong and the proofs are correct as far as I was able to check. A big part of the derivation follows ideas similar to the Euclidean case which I was able to check, but not being an expert in Riemannian geometry, I could have missed a technical mistake specific to the Riemannian case. The paper is well-written (as far as the 7-page limit permits…) and I found the experimental evaluations very convincing. I have a minor concern about the significance of the result because of the extra assumption of mean-squared Lipschitzness of the gradients. It was not clear to me how restrictive this assumption is and how often it is found in practice. In particular, I was surprised not to see any mention of it in the experimental section: do the examples considered there satisfy the assumption? I think it would improve the paper to say a bit more about this.