1 files changed, 55 insertions, 0 deletions
diff --git a/nips-2016-2253.txt b/nips-2016-2253.txt
new file mode 100644
index 0000000..56e1b0b
--- /dev/null
+++ b/nips-2016-2253.txt
@@ -0,0 +1,55 @@
+Summary
+=======
+
+This paper studies the problem of feature selection in sparse learning: given
+a learning problem where we seek to find a vector of parameters b maximizing
+some objective function (typically the likelihood given the observed data), the
+goal is to find the optimal such vector under the constraint that its support
+must be of size at most k.
+
+This problem has been studied by Das and Kempe in the specific case where the
+learning task to be performed is least-squares regression. In this setting they
+showed that if the data satisfies the RIP property, then the objective function
+is weakly submodular, leading to provable guarantees of a greedy algorithm for
+feature selection.
+
+This work is a generalization of this work for arbitrary objective functions:
+the main result is to show that if the objective function is strongly convex,
+then the resulting feature selection problem is weakly submodular, leading to
+provable guarantees for greedy algorithms.
+
+The (synthetic) experiments validate the approach and show that the algorithms
+perform according to the theoretical guarantees and outperform the baseline
+(Lasso selection)
+
+Comments
+========
+
+This is a really "clean" an nice paper: the connection between strong convexity
+and weak submodularity is very interesting conceptually. Even though the
+consequence for feature selection in sparse learning is an obvious gain,
+I believe the connection is interesting in itself since it is a new bridge
+between convexity and submodularity (and such bridges have led to interesting
+results in submodular optimization in the past 6-7 years).
+
+The paper is well-written and the proofs are easy to check for correctness
+(they follow from rather standard arguments in submodularity). It is
+interesting to have three algorithms with different running time complexity and
+approximation guarantees. The practical implication (as illustrated in the
+experiments) is that the practitioner can trade-off solution quality for
+running time.
+
+A standard assumption under which theoretical guarantees can be obtained in
+submodular optimization is known as "submodular curvature".  If there is
+a connection between weak submodularity and this notion of curvature it could
+be interesting to mention it (I haven't spent much time thinking about it, but
+it seems that there should be a connection, at least one implying the other).
+
+A final comment which is not about the quality of the paper (but rather about
+the general field of sparse learning and specifically paper [1]): it is always
+hard to interpret the conditions under which the theorems hold. The strong
+convexity condition implies something about the data (some form of
+non-degeneracy of the data) in the case where the objective function is defined
+in terms of the observed data. It is hard to obtain probabilistic guarantees
+for which these "non-degeneracy" properties hold; and the practical implications
+of these results are hard to grasp.