\documentclass[10pt]{article} \usepackage{amsmath,amsfonts,amsthm, amssymb} \DeclareMathOperator*{\E}{\mathbb{E}} \let\Pr\relax \DeclareMathOperator*{\Pr}{\mathbb{P}} \newcommand{\inprod}[1]{\left\langle #1 \right\rangle} \newcommand{\eqdef}{\mathbin{\stackrel{\rm def}{=}}} \newtheorem{theorem}{Theorem} \newtheorem{lemma}{Lemma} \author{Thibaut Horel} \title{CS 284r -- Fast Greedy Algorithms in MapReduce and Streaming -- Reading notes} \begin{document} \maketitle This paper proposes several MapReduce algorithms for various submodular maximization problems. Specifically, they consider the following problems: maximize a monotone submodular function under a cardinality constraint, maximize a modular function under a matroid constraint, maximize a monotone submodular function under one or several knapsack constraints, or under $p$-system constraints. At a high-level, all their algorithms rely on the \textsc{Sample\&Prune} method: this method is a parallelizable way of selecting good solutions for a function $f$. In this method, a seed solution is obtained through sampling elements from a universe (the sampling parameter is chosen so that the sampled data fits in memory), then the universe is pruned by removing all the elements which cannot improve the quality of the sampled solution. For submodular maximization under a cardinality constraint, \textsc{Sample \& Prune} is used on the thresholding algorithm simulating the $\epsilon$-greedy algorithm. The function on which \textsc{Sample\&Prune} is applied is simply the function which selects elements whose marginal contribution is larger than the current threshold. For modular maximization under a matroid constraint, \textsc{Sample \& Prune} can be used to simulate the standard greedy algorithm which is known to return the optimal solution. This case is much simpler than the submodular maximization case in that the marginal contribution of the elements does not vary throughout the algorithm, and the thresholding method is not required. Similar ideas are used for knapsack-constraints and $p$-systems constraints. Finally experiments show how the quality of the solution progresses as the number of rounds increase. General impression: natural generalization of the Max-Cover paper. \textbf{However}, they claim an improvement in the number of rounds required: this is hiding the fact that they implicitly assume an access to an oracle that computes the marginal contributions, hence their is more work to be done in each round (the MapReduce framework only requires that the running time of a round is polynomial). The experiments do not show a running time speedup. Open question: it would be interesting to see whether the partial enumeration method for knapsack constraints can be turned into a map-reduce algorithm. \end{document}