Summary
=======

This paper studies a facility-location problem formulated as a submodular
optimization problem. In machine learning, this problem has previously been
shown to apply in the context of dataset summarization/k-medians clustering.
Despite the submodularity of the objective function, solving the problem with
the standard greedy algorithm is prohibitive because the complexity is k*n*m
where k is the cardinality constraint and n*m is the size of the "benefit
matrix". A previously suggested approach is to first sparsify the benefit
matrix and then apply an optimization algorithm on the sparsified matrix.

This paper refines the results from [36] by giving theoretical guarantees on
the approximation ratio obtained by this method as a function of the chosen
sparsification level. Two methods of sparsification are analyzed:

* top t: only the top t items (most similar) are kept in each row of the
benefit matrix

* threshold based: only the entries of benefit matrix above the chosen
threshold are kept.

Finally experiments show how the suggested approach outperforms (in terms of
the quality of the solution) the (non sparsified) greedy algorithm for the same
running time. Or equivalently, reaching a given solution quality takes much
longer if no sparsification is used.

Comments
========

The problem is interesting and well-motivated. While I haven't read paper [36]
upon which this paper claims to build upon, I understand that a complete
theoretical understanding of the sparsification-based approach was still
missing and this paper is a good attempt at completing the picture. 

I think having precise theoretical guarantees is very valuable, especially
since they come with almost matching lower bound (up to logarithmic factors).

My only concern is about section 3.1 (threshold-based sparsification): I think
this section should have a more precise discussion of the following two facts:

* the stated theoretical guarantee is data dependent. Is there any way to
obtain a data-independent bound under extra assumptions? For example, assume
that the benefit matrix comes from a distribution, then given threshold x, the
sparsification is "good" with high probability. The benefit of such a result
would be to give some guidance on how to choose the threshold, or what is
a good threshold (which the paper only briefly discusses).

* it is also not clear how the LSH-based approach combines with the
threshold-based approach. It would be nice to see an "end-to-end"
theorem/proposition of the following flavor: assume the matrix is sparsified
using LSH, then with high probability, we get approximation of...

Finally, the experiments are interesting and convincing. However, while the
conclusion of section 4.1 is clear, the results of section 4.2 are hard to
interpret: there is no metric to quantify the quality of the solution and it
seems that the reader is simply invited to "eyeball" the results, as
a consequence I am not sure that this section is necessary.