Summary
=======

In this paper, the authors study the streaming submodular cover problem: given
a monotone submodular function f and a target utility Q, the goal is to find
a set S of minimal size such that f(S) is at least Q. The setting is the one of
streaming algorithms where the goal is to perform a small (constant) number of
passes over the data stream, and use sublinear memory (i.e the algorithm cannot
simply store the entire stream of data).

The first result is an impossibility result: any non-trivial approximation
guarantee (better than m/2 where m is the size of the stream) requires linear
memory.

This first result motivates the formulation of a bi-criteria formulation of the
streaming submodular cover problem: find a set of size at most delta times
the optimal set which gets a fraction (1-epsilon) of the optimal solution. The
authors provide an algorithm which provably obtains this, with delta
= 1/epsilon.

Comments
========

The paper is well-written and the results are correct as far as I checked (I
didn't read the proofs in the appendix but checked the ones in the main body of
the paper).

The experiments are interesting and illustrates well both the scalability of
the approach and the trade-off introduced by the parameter alpha in the
algorithm.

My main concern is a conceptual interrogation regarding the computational
model: I think it is hard to formulate a sound computation model for streaming
optimization of arbitrary submodular functions. Indeed, since these functions
can have exponential size representations in the general case, the authors
assume access to a value oracle (as is standard in the offline setting). It is
not clear to me what such oracles imply in the streaming setting: for example,
the algorithm could cheat and use value queries to avoid having to store items.
The exact interplay between the value queries and the memory limitation
introduced by the streaming model should be properly discussed.

For specific examples (like submodular set cover which was previously studied
in the streaming setting, or the ones discussed in the experiments), this
problem does not exist since the function has an explicit representation which
can be computed as the strem of elements is received, but I think the general
case requires a more in-depth discussion of the computational model.

Finally, while the lower-bound is novel and technically interesting, the
algorithm and its analysis seem to be an incremental variation on a well-known
technique: it has already been observed in other settings (mapreduce submodular
optimization for example) that the following simple threshold-based algorithm
gives constant approximation:

for decreasing values of the threshold (divide by a constant at each iteration)
include in the solutions all elements whose marginal contribution is above the
threshold)