diff options
Diffstat (limited to 'nips-2016-2253.txt')
| -rw-r--r-- | nips-2016-2253.txt | 55 |
1 files changed, 55 insertions, 0 deletions
diff --git a/nips-2016-2253.txt b/nips-2016-2253.txt new file mode 100644 index 0000000..56e1b0b --- /dev/null +++ b/nips-2016-2253.txt @@ -0,0 +1,55 @@ +Summary +======= + +This paper studies the problem of feature selection in sparse learning: given +a learning problem where we seek to find a vector of parameters b maximizing +some objective function (typically the likelihood given the observed data), the +goal is to find the optimal such vector under the constraint that its support +must be of size at most k. + +This problem has been studied by Das and Kempe in the specific case where the +learning task to be performed is least-squares regression. In this setting they +showed that if the data satisfies the RIP property, then the objective function +is weakly submodular, leading to provable guarantees of a greedy algorithm for +feature selection. + +This work is a generalization of this work for arbitrary objective functions: +the main result is to show that if the objective function is strongly convex, +then the resulting feature selection problem is weakly submodular, leading to +provable guarantees for greedy algorithms. + +The (synthetic) experiments validate the approach and show that the algorithms +perform according to the theoretical guarantees and outperform the baseline +(Lasso selection) + +Comments +======== + +This is a really "clean" an nice paper: the connection between strong convexity +and weak submodularity is very interesting conceptually. Even though the +consequence for feature selection in sparse learning is an obvious gain, +I believe the connection is interesting in itself since it is a new bridge +between convexity and submodularity (and such bridges have led to interesting +results in submodular optimization in the past 6-7 years). + +The paper is well-written and the proofs are easy to check for correctness +(they follow from rather standard arguments in submodularity). It is +interesting to have three algorithms with different running time complexity and +approximation guarantees. The practical implication (as illustrated in the +experiments) is that the practitioner can trade-off solution quality for +running time. + +A standard assumption under which theoretical guarantees can be obtained in +submodular optimization is known as "submodular curvature". If there is +a connection between weak submodularity and this notion of curvature it could +be interesting to mention it (I haven't spent much time thinking about it, but +it seems that there should be a connection, at least one implying the other). + +A final comment which is not about the quality of the paper (but rather about +the general field of sparse learning and specifically paper [1]): it is always +hard to interpret the conditions under which the theorems hold. The strong +convexity condition implies something about the data (some form of +non-degeneracy of the data) in the case where the objective function is defined +in terms of the observed data. It is hard to obtain probabilistic guarantees +for which these "non-degeneracy" properties hold; and the practical implications +of these results are hard to grasp. |
