distributed-submodular/main.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

\documentclass[10pt]{article}
\usepackage{amsmath,amsfonts,amsthm, amssymb}


\DeclareMathOperator*{\E}{\mathbb{E}}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbb{P}}
\newcommand{\inprod}[1]{\left\langle #1 \right\rangle}
\newcommand{\eqdef}{\mathbin{\stackrel{\rm def}{=}}}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}

\author{Thibaut Horel}
\title{CS 284r -- Distributed Submodular Maximization -- Reading
notes}

\begin{document}

\maketitle

This article proposes a new algorithm for submodular maximization under
a $k$ cardinality constraint in a parallel way. In particular they propose an
algorithm whose approximation ratio is $(1-1/e)^2/\min(m, k)$, where $m$ is the
number of groups in which the ground set will be partitioned ($m$ is typically
the number of machines used in a parallel computation).

The idea of the algorithm is very simple: partition the ground set into $m$
groups. Using the greedy algorithm, find an approximate solution of size $k$ on
each machine (on each subset of the partition). Then merge the solutions and
run the greedy algorithm on the merge set (of size at most $mk$) to extract
a solution of size $k$.

For submodular functions acting on a ground set which is embedded in a metric
space (as is usual for submodular functions coming from machine learning
applications, the authors give two examples of such functions), then one can
prove that if the function satisfies a Lipschitz property (a natural extension
of the Lipschitz property for set functions of this type) and if the optimal
solution is not too ``isolated'' (i.e there are enough ground set elements in
the neighborhood of the optimal solution) then the approximation ratio is
$(1-1/e)^2$ (up to an additive term) when the elements of the ground set are
partitioned uniformly at random.

An experimental section shows that the presented algorithm performs well
compared to other submodular maximization approaches.

Overall impression : simple results. The approximation ratio are not very good
if you don't assume the geometric structure of the optimal solution. The paper
doesn't give a sense of how common this assumption is. They only study the
example of exemplar based clustering, but many other examples of subdmodular
funtions in machine learning exist. It is a bit worrying that they don't study
those, should we assume that they don't satisfy the geometric structure
property?

Open question: on the theoretical side, it looks like better approximation
ratios could be obtained in cases where we have bounds on the curvature of the
submodular function. It would be curious to see if this is true.

\end{document}