Summary ======= This paper studies a facility-location problem formulated as a submodular optimization problem. In machine learning, this problem has previously been shown to apply in the context of dataset summarization/k-medians clustering. Despite the submodularity of the objective function, solving the problem with the standard greedy algorithm is prohibitive because the complexity is k*n*m where k is the cardinality constraint and n*m is the size of the "benefit matrix". A previously suggested approach is to first sparsify the benefit matrix and then apply an optimization algorithm on the sparsified matrix. This paper refines the results from [36] by giving theoretical guarantees on the approximation ratio obtained by this method as a function of the chosen sparsification level. Two methods of sparsification are analyzed: * top t: only the top t items (most similar) are kept in each row of the benefit matrix * threshold based: only the entries of benefit matrix above the chosen threshold are kept. Finally experiments show how the suggested approach outperforms (in terms of the quality of the solution) the (non sparsified) greedy algorithm for the same running time. Or equivalently, reaching a given solution quality takes much longer if no sparsification is used. Comments ======== The problem is interesting and well-motivated. While I haven't read paper [36] upon which this paper claims to build upon, I understand that a complete theoretical understanding of the sparsification-based approach was still missing and this paper is a good attempt at completing the picture. I think having precise theoretical guarantees is very valuable, especially since they come with almost matching lower bound (up to logarithmic factors). My only concern is about section 3.1 (threshold-based sparsification): I think this section should have a more precise discussion of the following two facts: * the stated theoretical guarantee is data dependent. Is there any way to obtain a data-independent bound under extra assumptions? For example, assume that the benefit matrix comes from a distribution, then given threshold x, the sparsification is "good" with high probability. The benefit of such a result would be to give some guidance on how to choose the threshold, or what is a good threshold (which the paper only briefly discusses). * it is also not clear how the LSH-based approach combines with the threshold-based approach. It would be nice to see an "end-to-end" theorem/proposition of the following flavor: assume the matrix is sparsified using LSH, then with high probability, we get approximation of... Finally, the experiments are interesting and convincing. However, while the conclusion of section 4.1 is clear, the results of section 4.2 are hard to interpret: there is no metric to quantify the quality of the solution and it seems that the reader is simply invited to "eyeball" the results, as a consequence I am not sure that this section is necessary.