From 5550e00c788050053a01cae9d5ab40219ff03a67 Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Mon, 18 Jul 2016 00:39:35 -0400 Subject: NIPS 2016 reviews (only one good paper) --- nips-2016-1303.txt | 53 +++++++++++++++++++++++++++++++++++++++++++++++ nips-2016-1690.txt | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++ nips-2016-220.txt | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ nips-2016-2230.txt | 56 +++++++++++++++++++++++++++++++++++++++++++++++++ nips-2016-2253.txt | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ nips-2016-471.txt | 46 ++++++++++++++++++++++++++++++++++++++++ nips-2016-988.txt | 46 ++++++++++++++++++++++++++++++++++++++++ 7 files changed, 375 insertions(+) create mode 100644 nips-2016-1303.txt create mode 100644 nips-2016-1690.txt create mode 100644 nips-2016-220.txt create mode 100644 nips-2016-2230.txt create mode 100644 nips-2016-2253.txt create mode 100644 nips-2016-471.txt create mode 100644 nips-2016-988.txt diff --git a/nips-2016-1303.txt b/nips-2016-1303.txt new file mode 100644 index 0000000..77b67f5 --- /dev/null +++ b/nips-2016-1303.txt @@ -0,0 +1,53 @@ +Summary +======= + +The paper studies the problem of maximizing a submodular set function under +multiple knapsack (packing/linear) constraints. The authors build on a now +standard technique: optimizing the multilinear extension of the submodular +function and then rounding the fractional solution using the "contention +resolution scheme" framework. The key novelty is to observe that in the +continuous greedy algorithm, the step size can be chosen proportional +1/(s+l+1)^3 where s is the upper-bound on the cardinality of the solution set +and l is the number of packing constraints. Furthermore, it is shown that the +algorithm can be implemented in such a way that the running solution has +sparsity (s+l+1) which leads to faster updates. + +The suggested algorithms are evaluted experimentally and compared to the +"threshold greedy algorithm" of [8]. + +Comments +======== + +The problem of maximizing a submodular function under packing constraints +is important and interesting and there is definitely value in obtaining +a better understanding of the computational complexity of this problem under +various settings. My main concern is that this paper seems to be an incremental +improvement over already known results: + +* the paper mentions "improved algorithms" (plural) but there is only one +algorithm which is explicitly stated (algorithm 1). Furthermore, this +algorithm is the standard continuous greedy algorithm from [15] with a specific +accounting of the sparsity of the solution. I am assuming that the other +algorithm is the Multiplicative Weight algorithm, but the authors simply refer +to [5] for this algorithm, so it can't be considered novel. + +* instead of mentioning "improved algorithms", I think it would be more +accurate to say "improved analysis". To my understanding, the algorithms are +the same, only the analysis is refined to properly account for the sparsity of +the solution. + +Furthermore, this paper is lacking the following two things: + +* a precise statement (in a theorem/proposition) of the end-to-end + computational complexity of the suggested algorithm. This is important if the + main contribution is improved running times. + +* an explanation of why the multiplicative weight algorithm has worse running + time in the experiments, even though the point of this algorithm is to + improve the running time of the continuous greedy algorithm. This is + counter-intuitive since I would expect the LP solving part of the algorithm + to dominate the running time (and the multiplicative weight algorithm + specifically addresses this). Is there any reason at all to prefer the + multiplicative weight algorithm? Also, it would be interesting in the + experiment to show a breakdown of the running time: which fraction of the + time is spent sloving LPs? diff --git a/nips-2016-1690.txt b/nips-2016-1690.txt new file mode 100644 index 0000000..74227ff --- /dev/null +++ b/nips-2016-1690.txt @@ -0,0 +1,58 @@ +Summary +======= + +This paper studies a facility-location problem formulated as a submodular +optimization problem. In machine learning, this problem has previously been +shown to apply in the context of dataset summarization/k-medians clustering. +Despite the submodularity of the objective function, solving the problem with +the standard greedy algorithm is prohibitive because the complexity is k*n*m +where k is the cardinality constraint and n*m is the size of the "benefit +matrix". A previously suggested approach is to first sparsify the benefit +matrix and then apply an optimization algorithm on the sparsified matrix. + +This paper refines the results from [36] by giving theoretical guarantees on +the approximation ratio obtained by this method as a function of the chosen +sparsification level. Two methods of sparsification are analyzed: + +* top t: only the top t items (most similar) are kept in each row of the +benefit matrix + +* threshold based: only the entries of benefit matrix above the chosen +threshold are kept. + +Finally experiments show how the suggested approach outperforms (in terms of +the quality of the solution) the (non sparsified) greedy algorithm for the same +running time. Or equivalently, reaching a given solution quality takes much +longer if no sparsification is used. + +Comments +======== + +The problem is interesting and well-motivated. While I haven't read paper [36] +upon which this paper claims to build upon, I understand that a complete +theoretical understanding of the sparsification-based approach was still +missing and this paper is a good attempt at completing the picture. + +I think having precise theoretical guarantees is very valuable, especially +since they come with almost matching lower bound (up to logarithmic factors). + +My only concern is about section 3.1 (threshold-based sparsification): I think +this section should have a more precise discussion of the following two facts: + +* the stated theoretical guarantee is data dependent. Is there any way to +obtain a data-independent bound under extra assumptions? For example, assume +that the benefit matrix comes from a distribution, then given threshold x, the +sparsification is "good" with high probability. The benefit of such a result +would be to give some guidance on how to choose the threshold, or what is +a good threshold (which the paper only briefly discusses). + +* it is also not clear how the LSH-based approach combines with the +threshold-based approach. It would be nice to see an "end-to-end" +theorem/proposition of the following flavor: assume the matrix is sparsified +using LSH, then with high probability, we get approximation of... + +Finally, the experiments are interesting and convincing. However, while the +conclusion of section 4.1 is clear, the results of section 4.2 are hard to +interpret: there is no metric to quantify the quality of the solution and it +seems that the reader is simply invited to "eyeball" the results, as +a consequence I am not sure that this section is necessary. diff --git a/nips-2016-220.txt b/nips-2016-220.txt new file mode 100644 index 0000000..2c5b67a --- /dev/null +++ b/nips-2016-220.txt @@ -0,0 +1,61 @@ +Summary +======= + +This papers studies submodular functions over a continuous domain: these +functions are sometimes referred to as "smooth" submodular functions and are +the direct generalization of discrete submodular functions for continuous +variables. While prior work has focused on obtaining algorithms and +optimization guarantees for minimizing smooth submodular functions, this paper +focuses on obtaining approximation guarantees for maximizing smooth submodular +functions. Specifically, the authors: + +* give and prove a characterization of submodularity (and coordinate-wise + concavity) in terms of a diminishing return (DR) property. This is + a generalization of a well-known characterization in the discrete case. + +* give a (1-1/e) approximation algorithm for smooth submodular functions which + are coordinate wise concave and monotone + +* give a 1/3 approximation algorithm for arbitrary smooth submodular functions + +* give examples of problems which lead to optimizing a smooth submodular + function + +* validate their algorithms through experiments + +Comments +======== + +Smooth submodular functions have gathered interest in the recent years and as +the authors point out, event though results were known for minimization, there +was a lack of results on the maximization side. + +While it is interesting to see that the results of the discrete case translate +well to the continuous case, this is also the source of a (relative) weakness +of the paper: a lack of technical novelty. + +Section 4 in particular is a direct adaptation of the continuous algorithm of +Vondrak 2008. I don't see any fundamental difference in the statement of the +algorithm and the proof follows a very similar technique. Also, it would have +been interesting to mention that the multilinear extension of a monotone +submodular function is DR-submodular to clearly show the connection with +already existing results. + +The algorithm in section 5 is more novel: it draws heavily from the algorithm +of [Buchbinder 2012] but required an adaptation to the continuous case (and +a specific analysis). + +It is interesting that the authors give examples where maximizing smooth +submodular functions arise. Some of these examples seem a bit contrived, but +I don't have enough domain-specific knowledge to judge this confidently. + +The paper is well written overall, but suffers from some notations +inconsitencies locally which can be unsettling for the reader. Two examples: + +* the definition of submodularity line 80 is given for arbitrary compact sets, +but later the authors seem to focs on the specific case of the positive orthant +of R^n, it is not clear if the results hold in the most general case. + +* definition 3.1 is hard to read: is it necessary to introduce a ground set and +then the characteristic vectors of the ground set elements? why not simply use +R^n and the standard basis vectors? diff --git a/nips-2016-2230.txt b/nips-2016-2230.txt new file mode 100644 index 0000000..ea73164 --- /dev/null +++ b/nips-2016-2230.txt @@ -0,0 +1,56 @@ +Summary +======= + +In this paper, the authors study the streaming submodular cover problem: given +a monotone submodular function f and a target utility Q, the goal is to find +a set S of minimal size such that f(S) is at least Q. The setting is the one of +streaming algorithms where the goal is to perform a small (constant) number of +passes over the data stream, and use sublinear memory (i.e the algorithm cannot +simply store the entire stream of data). + +The first result is an impossibility result: any non-trivial approximation +guarantee (better than m/2 where m is the size of the stream) requires linear +memory. + +This first result motivates the formulation of a bi-criteria formulation of the +streaming submodular cover problem: find a set of size at most delta times +the optimal set which gets a fraction (1-epsilon) of the optimal solution. The +authors provide an algorithm which provably obtains this, with delta += 1/epsilon. + +Comments +======== + +The paper is well-written and the results are correct as far as I checked (I +didn't read the proofs in the appendix but checked the ones in the main body of +the paper). + +The experiments are interesting and illustrates well both the scalability of +the approach and the trade-off introduced by the parameter alpha in the +algorithm. + +My main concern is a conceptual interrogation regarding the computational +model: I think it is hard to formulate a sound computation model for streaming +optimization of arbitrary submodular functions. Indeed, since these functions +can have exponential size representations in the general case, the authors +assume access to a value oracle (as is standard in the offline setting). It is +not clear to me what such oracles imply in the streaming setting: for example, +the algorithm could cheat and use value queries to avoid having to store items. +The exact interplay between the value queries and the memory limitation +introduced by the streaming model should be properly discussed. + +For specific examples (like submodular set cover which was previously studied +in the streaming setting, or the ones discussed in the experiments), this +problem does not exist since the function has an explicit representation which +can be computed as the strem of elements is received, but I think the general +case requires a more in-depth discussion of the computational model. + +Finally, while the lower-bound is novel and technically interesting, the +algorithm and its analysis seem to be an incremental variation on a well-known +technique: it has already been observed in other settings (mapreduce submodular +optimization for example) that the following simple threshold-based algorithm +gives constant approximation: + +for decreasing values of the threshold (divide by a constant at each iteration) +include in the solutions all elements whose marginal contribution is above the +threshold) diff --git a/nips-2016-2253.txt b/nips-2016-2253.txt new file mode 100644 index 0000000..56e1b0b --- /dev/null +++ b/nips-2016-2253.txt @@ -0,0 +1,55 @@ +Summary +======= + +This paper studies the problem of feature selection in sparse learning: given +a learning problem where we seek to find a vector of parameters b maximizing +some objective function (typically the likelihood given the observed data), the +goal is to find the optimal such vector under the constraint that its support +must be of size at most k. + +This problem has been studied by Das and Kempe in the specific case where the +learning task to be performed is least-squares regression. In this setting they +showed that if the data satisfies the RIP property, then the objective function +is weakly submodular, leading to provable guarantees of a greedy algorithm for +feature selection. + +This work is a generalization of this work for arbitrary objective functions: +the main result is to show that if the objective function is strongly convex, +then the resulting feature selection problem is weakly submodular, leading to +provable guarantees for greedy algorithms. + +The (synthetic) experiments validate the approach and show that the algorithms +perform according to the theoretical guarantees and outperform the baseline +(Lasso selection) + +Comments +======== + +This is a really "clean" an nice paper: the connection between strong convexity +and weak submodularity is very interesting conceptually. Even though the +consequence for feature selection in sparse learning is an obvious gain, +I believe the connection is interesting in itself since it is a new bridge +between convexity and submodularity (and such bridges have led to interesting +results in submodular optimization in the past 6-7 years). + +The paper is well-written and the proofs are easy to check for correctness +(they follow from rather standard arguments in submodularity). It is +interesting to have three algorithms with different running time complexity and +approximation guarantees. The practical implication (as illustrated in the +experiments) is that the practitioner can trade-off solution quality for +running time. + +A standard assumption under which theoretical guarantees can be obtained in +submodular optimization is known as "submodular curvature". If there is +a connection between weak submodularity and this notion of curvature it could +be interesting to mention it (I haven't spent much time thinking about it, but +it seems that there should be a connection, at least one implying the other). + +A final comment which is not about the quality of the paper (but rather about +the general field of sparse learning and specifically paper [1]): it is always +hard to interpret the conditions under which the theorems hold. The strong +convexity condition implies something about the data (some form of +non-degeneracy of the data) in the case where the objective function is defined +in terms of the observed data. It is hard to obtain probabilistic guarantees +for which these "non-degeneracy" properties hold; and the practical implications +of these results are hard to grasp. diff --git a/nips-2016-471.txt b/nips-2016-471.txt new file mode 100644 index 0000000..938ccb2 --- /dev/null +++ b/nips-2016-471.txt @@ -0,0 +1,46 @@ +Summary +======= + +In this paper, the authors study the standard influence maximization problem: +assuming the independent cascade model (a random diffusion process in graphs), +the goal is to select a seed set of k nodes. While the traditional influence +maximization problem focuses on the maximizing the expected "cascade size" +(i.e. number of "infected" vertices at the end of the random process), the +authors note that this is not always desirable in risk-averse settings. The +contributions are: + +* formulating a maximization problem where the objective function is the CVar +(conditional value at risk) risk measure + +* providing a "multiplicative weight update" algorithm to solve the problem. +the algorithm is proved to find a solution within an additive 1/e approximation +of the optimal solution + +* comparing their algorithm experimentally with the standard greedy algorithm +which maximizes the expected cascade size. + +Comments +======== + +The problem is very interesting: a lot of effort has been put into maximizing +expected cascade size in the influence maximization problem, but little is +known as far as "worst case" or "risk measures" are concerned, and I think +there is a lot of value in obtaining these more nuanced results. + +The experiments show very well that the resulting cascade size distribution is +very different when maximizing the Cvar metric as opposed to maximizing +expected size. It is also clear that the distribution obtained when maximizing +CVAR is more desirable in settings where risk needs to be mitigated. + +The algorithm is interesting: the ideas are intuitive and standard (sampling ++ multiplicative updates) but to the best of my knowledge are not traditionally +used in the context of influence maximization. + +My only concern is about the specific choice of objective function: I feel that +the CVAR metric is not very well motivated. Are there any ways to relate it to +the one used in [5] or [20] which seem more intuitive and easier to interpret? +The interpretability is made even harder by not choosing a single seed set +(fixed strategy) but choosing a distribution over seed sets (random strategy) +and maximizing the CVAR of this random strategy. Also, it would be interesting +to know more about the properties of the CVAR objective function: is it +convex/concave in the choice of the random strategy? diff --git a/nips-2016-988.txt b/nips-2016-988.txt new file mode 100644 index 0000000..40f4513 --- /dev/null +++ b/nips-2016-988.txt @@ -0,0 +1,46 @@ +Summary +======= + +This paper studies the problem of network inference problem for the RDS +(Respondent Driven Sampling) diffusion process. This random diffusion process +models a standard procedure used in epidemiology studies. Under this model, the +authors formulate the network reconstruction process (finding the unknown graph +over which the diffusion took place) as a MAP (maximum a posteriori) +optimization problem. The contributions are: + +* showing that the objective function is log-submodular + +* giving a "variational inference" algorithm: the objective function is +approximated by linear lower and upper bounds and solving for those upper and +lower bounds provides an approximate solution to the problem + +* experiments showing the validity of the approach + +Comments +======== + +The problem of network reconstruction is important and hadn't been studied from +this perspective in the context of Respondent Driven Sampling. While the +experiments show that the suggested approach works well in practice, I find the +paper to be lacking a conceptual/technical contribution: + +* it was already shown that the log likelihood was submodular in [8] for the +closely related continuous-time independent cascade model. Even though the +diffusion process here is slightly different, this result can hardly be +considered novel. + +* the suggested approach (approximating the objective function with linear +lower and upper bounds) was alread suggested in [5], including the specific +choice of the lower and upper additive approximations + +* the formulas from [5] are barely instantiated to the specific objective +function at hand here. It could have been interesting to show how the general +framework of [5] leads to a simple algorithm for this specific problem, but no +algorithm is provided and very little is said about how the computation should +be performed in practice. For example the paragraph starting at line 252 is +very vague. + +* no theoretical guarantees are provided for the approximation obtained by the +suggested approach. Some guarantees were obtained in [5] under a curvature +assumption. It could have been interesting to show which form these assumptions +take in this specific context. -- cgit v1.2.3-70-g09d2