mapreduce-sub/main.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

\documentclass[10pt]{article}
\usepackage{amsmath,amsfonts,amsthm, amssymb}


\DeclareMathOperator*{\E}{\mathbb{E}}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbb{P}}
\newcommand{\inprod}[1]{\left\langle #1 \right\rangle}
\newcommand{\eqdef}{\mathbin{\stackrel{\rm def}{=}}}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}{Lemma}

\author{Thibaut Horel}
\title{CS 284r -- Fast Greedy Algorithms in MapReduce and Streaming -- Reading
notes}

\begin{document}

\maketitle

This paper proposes several MapReduce algorithms for various submodular
maximization problems. Specifically, they consider the following problems:
maximize a monotone submodular function under a cardinality constraint,
maximize a modular function under a matroid constraint, maximize a monotone
submodular function under one or several knapsack constraints, or under
$p$-system constraints.

At a high-level, all their algorithms rely on the \textsc{Sample\&Prune}
method: this method is a parallelizable way of selecting good solutions for
a function $f$. In this method, a seed solution is obtained through sampling
elements from a universe (the sampling parameter is chosen so that the sampled
data fits in memory), then the universe is pruned by removing all the elements
which cannot improve the quality of the sampled solution.

For submodular maximization under a cardinality constraint,
\textsc{Sample \& Prune} is used on the thresholding algorithm simulating the
$\epsilon$-greedy algorithm. The function on which \textsc{Sample\&Prune} is
applied is simply the function which selects elements whose marginal
contribution is larger than the current threshold.

For modular maximization under a matroid constraint, \textsc{Sample \& Prune}
can be used to simulate the standard greedy algorithm which is known to return
the optimal solution. This case is much simpler than the submodular
maximization case in that the marginal contribution of the elements does not
vary throughout the algorithm, and the thresholding method is not required.
Similar ideas are used for knapsack-constraints and $p$-systems constraints.
Finally experiments show how the quality of the solution progresses as the
number of rounds increase.

General impression: natural generalization of the Max-Cover paper.
\textbf{However}, they claim an improvement in the number of rounds required:
this is hiding the fact that they implicitly assume an access to an oracle that
computes the marginal contributions, hence their is more work to be done in
each round (the MapReduce framework only requires that the running time of
a round is polynomial). The experiments do not show a running time speedup.
Open question: it would be interesting to see whether the partial enumeration
method for knapsack constraints can be turned into  a map-reduce algorithm.

\end{document}