\documentclass[10pt]{article} \usepackage{amsmath,amsfonts,amsthm, amssymb} \DeclareMathOperator*{\E}{\mathbb{E}} \let\Pr\relax \DeclareMathOperator*{\Pr}{\mathbb{P}} \newcommand{\inprod}[1]{\left\langle #1 \right\rangle} \newcommand{\eqdef}{\mathbin{\stackrel{\rm def}{=}}} \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \author{Thibaut Horel} \title{CS 284r -- Distributed Submodular Maximization -- Reading notes} \begin{document} \maketitle This article proposes a new algorithm for submodular maximization under a $k$ cardinality constraint in a parallel way. In particular they propose an algorithm whose approximation ratio is $(1-1/e)^2/\min(m, k)$, where $m$ is the number of groups in which the ground set will be partitioned ($m$ is typically the number of machines used in a parallel computation). The idea of the algorithm is very simple: partition the ground set into $m$ groups. Using the greedy algorithm, find an approximate solution of size $k$ on each machine (on each subset of the partition). Then merge the solutions and run the greedy algorithm on the merge set (of size at most $mk$) to extract a solution of size $k$. For submodular functions acting on a ground set which is embedded in a metric space (as is usual for submodular functions coming from machine learning applications, the authors give two examples of such functions), then one can prove that if the function satisfies a Lipschitz property (a natural extension of the Lipschitz property for set functions of this type) and if the optimal solution is not too ``isolated'' (i.e there are enough ground set elements in the neighborhood of the optimal solution) then the approximation ratio is $(1-1/e)^2$ (up to an additive term) when the elements of the ground set are partitioned uniformly at random. An experimental section shows that the presented algorithm performs well compared to other submodular maximization approaches. Overall impression : simple results. The approximation ratio are not very good if you don't assume the geometric structure of the optimal solution. The paper doesn't give a sense of how common this assumption is. They only study the example of exemplar based clustering, but many other examples of subdmodular funtions in machine learning exist. It is a bit worrying that they don't study those, should we assume that they don't satisfy the geometric structure property? Open question: on the theoretical side, it looks like better approximation ratios could be obtained in cases where we have bounds on the curvature of the submodular function. It would be curious to see if this is true. \end{document}