summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--ijcai-2021-1821.txt57
-rw-r--r--ijcai-2021-3017.txt62
-rw-r--r--ijcai-2021-366.txt54
-rw-r--r--ijcai-2021-47.txt99
4 files changed, 272 insertions, 0 deletions
diff --git a/ijcai-2021-1821.txt b/ijcai-2021-1821.txt
new file mode 100644
index 0000000..e942f3a
--- /dev/null
+++ b/ijcai-2021-1821.txt
@@ -0,0 +1,57 @@
+This paper considers the problem of designing an efficient (welfare optimal)
+combinatorial auction in the value query model. Specifically, there are
+n bidders and m items, each bidder has preferences over bundles (sets) of
+items, and the goal is to allocate the items to the bidders in a way which
+(approximately) maximizes social welfare and design payments which incentivize
+truthful report of preferences.
+
+Starting from the iterative MLCA mechanism of Brero et al. (2021) (in which
+bidders' value functions are approximated by a neural network trained on their
+answers to previous queries), the mechanism designed in the present paper goes
+one step further by making use of Fourier Analysis for set functions.
+Specifically, motivated by the fact the bidders' valuation are likely to be
+approximately sparse in the Fourier domain (for a well-chosen transform), the
+mechanism finds a Fourier-sparse approximation of the neural networks
+approximating the bidder's preferences. This then requires computing the
+efficient allocation for these Fourier-sparse approximations, for which the
+authors develop an MIP-based algorithm.
+
+On the empirical side, the authors use the STATS test suite of Weiss et al
+(2017) to generate synthetic instances of combinatorial auctions. After
+evaluating the extent to which bidders' valuation are Fourier-sparse, they show
+that their proposed mechanism compares favorably to MLCA.
+
+Detailed comments
+=================
+
+The paper considers an important and difficult problem and brings a fresh and
+interesting perspective by applying ideas from Fourier analysis for set
+functions. The paper was clear and well-written.
+
+My main issue with the proposed method is a conceptual one: while I believe
+that in many situations, it is reasonable to assume that bidders' valuations
+are Fourier-sparse, I find that the Hybrid approach somewhat undermines this
+assumption by having two competing types of approximations for bidders'
+valuations: one based on Neural Networks and one based on Fourier-sparse
+functions.
+
+While I understand that a purely Fourier-based approximation would be
+prohibitively expensive in terms of number of queries made to the bidders, the
+interaction between the Neural Network approximation and the Fourier-sparse one
+is hard to reason about: for example, it could be that even if the true value
+functions are approximately Fourier-sparse, the Neural network approximation
+which is performed first obfuscates this fact to some extent and makes it
+harder for the Fourier reconstruction to work.
+
+Overall, I would say that this hybrid approximation is not very well motivated
+from a conceptual perspective, and is only justified by the fact that it yields
+some improvement (although arguably fairly minor ones) when evaluated
+experimentally. However it seems much harder to analyze theoretically (although
+this was not a goal of the present paper) and comes at a cost in terms of
+implementation. In particular, there are now more hyperparameters to tune,
+mainly the splits l_1, to l_4 for the number of queries.
+
+In conclusion, while I believe in using more ideas from Fourier analysis in the
+design of combinatorial auctions, I am not convinced that the cost of this
+specific Hybrid instantiation both from a conceptual and implementation
+perspective is justified by the minor improvements observed experimentally.
diff --git a/ijcai-2021-3017.txt b/ijcai-2021-3017.txt
new file mode 100644
index 0000000..b869aba
--- /dev/null
+++ b/ijcai-2021-3017.txt
@@ -0,0 +1,62 @@
+This paper introduces a general framework to find a truthful, (approximately)
+revenue-maximizing combinatorial auction mechanism inside a given class of
+mechanisms. A typical situation where this is useful is when considering
+a parametrized class of mechanisms for which the optimal value of the parameter
+depends on the (unknown) distribution of bidders and thus needs to be learned.
+
+The framework is based on the general idea of random sampling, where we learn
+the optimal mechanism on a portion of the population but apply it to another
+portion. The main idea of the framework—which can be thought of as a way to
+make it more robust to variability in the bidders' population—is to first draw
+a set of participatory bidders and then learn on the complement set of bidders
+by repeatedly drawing samples of bidders from the complement set and finding
+a mechanism which maximizes revenue *on average* over the samples. The
+mechanism thus identified is then applied to the set of participatory bidders.
+
+In addition to the introduction of this general framework, the main
+contribution of the papers are:
+
+1. a general bound on the revenue of the mechanisms output by this method,
+ which depends on various parameters quantifying the complexity of the
+ auction class and the variability in the population of bidders.
+
+2. a specific instantiation of the framework to a new class of auctions
+ introduced by the authors, bundling-boosted auctions, for which it is
+ possible to quantify more finely the various parameters described in item 1.
+
+3. an algorithm to efficiently find the mechanism maximizing expected revenue
+ (as required by the framework) for sparse bundling-boosted auctions.
+
+Detailed comments
+=================
+
+The problem considered in this paper is important and the paper brings a nice
+unifying perspective to it by the introduction of a general framework and
+analysis of its performance through learning-theoretic complexity measures.
+Furthermore, the paper is very clear and nicely written (despite the sadly
+restrictive 7 page limit…).
+
+* an important feature of the framework, is that it completely gives up on
+ extracting any revenue from non-participatory bidders. While I understand
+ that this might be a necessary feature given the generality of the scheme,
+ some prior work partially avoid this loss, albeit in much more restrictive
+ settings. Maybe more could be said as a motivation about the trade-off
+ between generality of the scheme and how much revenue we can hope to extract.
+
+* relatedly, I was a bit surprised to see that the bounds on the revenue of the
+ scheme were expressed with respect to the efficient allocation on the entire
+ population. Why use this as a benchmark, as opposed to, for example, the
+ optimal revenue that could be obtained on the entire population? This might
+ be standard in this line of work but adding a sentence of two about this
+ choice might be helpful.
+
+* a drawback of the scheme, which again might be due to its generality is that
+ its guarantees depend of parameters which might be hard to compute or even
+ bound in practice. Although the authors try to explicitate them for specific
+ auction classes, they still remain somewhat opaque. This is particularly true
+ of the partition discrepancy quantity.
+
+* the scheme is reminiscent of the bootstrap method in statistics where
+ multiple samples are drawn with replacement from the same initial sample to
+ account for variability in the population. Maybe this could be mentioned
+ somewhere to connect the paper to the broader statistics/learning literature.
diff --git a/ijcai-2021-366.txt b/ijcai-2021-366.txt
new file mode 100644
index 0000000..f0be2e8
--- /dev/null
+++ b/ijcai-2021-366.txt
@@ -0,0 +1,54 @@
+This paper proposes a general stochastic first-order method for the
+optimization of a smooth non-convex function over a Riemannian manifold.
+
+While SGD is known to be minimax optimal in general (requiring O(eps^-4)
+stochastic gradient queries to find an esp-approximate stationary point), it
+has been shown that variants of SGD can achieve O(eps^-3) sample complexity
+when the objective function is required to satisfy a stronger, mean-squared
+smoothness assumption. Previous adaptations of these methods to the Riemannian
+case were either specific to certain manifolds or required large batch gradient
+updates at certain iterations, leading to prohibitive computation costs.
+
+The method introduced in this paper can be described as the Riemannian
+counterpart of the Stochastic Recursive Momentum method of [Cutkosky et al.,
+2019]. The main idea is to interpolate smoothly, over the iterations of the
+algorithm, between vanilla Riemannian SGD and the momentum-based RSPRG
+algorithm of [Han et al., 2020; Zhou et al., 2019].
+
+The main theoretical result is that with a proper decaying schedule for the
+step size and interpolation parameter (controlling how much of the momentum
+term to mix in to the gradient update), the proposed algorithm converges with
+a nearly-optimal rate (up to logarithmic terms) and only requires one random
+sample and two gradient computations per iterations. The result holds for
+arbitrary Riemannian manifolds and only requires tuning two hyperparameters.
+
+An experimental section evaluates the proposed algorithm for three distinct
+tasks: PCA (on the Grassmann manifold), ICA (on the Stiefel manifold) and
+Riemannian Centroid, and shows that it consistently outperforms other
+Riemannian stochastic first-order methods.
+
+Detailed comments
+=================
+
+The problem considered in this paper is relevant since many tasks in data
+analysis are naturally constrained to a Riemannian manifold (for example PCA or
+ICA considered in the experimental section). Furthermore it is satisfying both
+from a theoretical perspective and applicability perspective to have a general
+method which can be analyzed for arbitrary manifolds.
+
+The convergence result is strong and the proofs are correct as far as I was
+able to check. A big part of the derivation follows ideas similar to the
+Euclidean case which I was able to check, but not being an expert in Riemannian
+geometry, I could have missed a technical mistake specific to the Riemannian
+case.
+
+The paper is well-written (as far as the 7-page limit permits…) and I found the
+experimental evaluations very convincing.
+
+I have a minor concern about the significance of the result because of the
+extra assumption of mean-squared Lipschitzness of the gradients. It was not
+clear to me how restrictive this assumption is and how often it is found in
+practice. In particular, I was surprised not to see any mention of it in the
+experimental section: do the examples considered there satisfy the assumption?
+I think it would improve the paper to say a bit more about this.
+
diff --git a/ijcai-2021-47.txt b/ijcai-2021-47.txt
new file mode 100644
index 0000000..1c40bae
--- /dev/null
+++ b/ijcai-2021-47.txt
@@ -0,0 +1,99 @@
+This paper introduces a new stochastic first-order method for the optimization
+of a non-convex population risk. The main idea is to combine the standard Adam
+procedure with techniques from the field of differential privacy to obtain
+better generalization guarantees. Specifically, given an iid sample from the
+population, the procedures splits it two halves, the training data and test
+data, and gradients are computed only on the training data, unless they differ
+significantly from the gradients computed on the test data. In cases where
+there is a significant different, the “test gradients” are revealed via
+a differentially private mechanism and used instead of the “training
+gradients”.
+
+The authors study two concrete instantiations of the above procedure, based on
+the Laplace and Gaussian mechanisms respectively and provide theoretical
+convergence guarantees for both of them, showing that the procedures converge
+to a stationary point of the population loss. Finally the procedures are
+evaluated through experiments and claimed to perform better than other
+sate-of-the-art methods.
+
+Detailed Comments
+=================
+
+The problem studied in this paper is a central question in the field of AI
+nowadays: understanding the extent to which a model trained via stochastic
+first-order methods to minimize the empirical risk also minimizes the
+population risk. The proposed methods are interesting and based on a natural
+application of differential privacy ideas which should intuitively lead to more
+robust out-of-sample performance.
+
+I have several concerns about the significance and novelty of the results on
+the one hand, and on the quality of writing and presentation on the other hand.
+
+1. Novelty and significance:
+
+ * this paper seems very closely related to the paper “Towards Better
+ Generalization of Adaptive Gradient Methods” published in NeurIPS'20 and
+ mentioned in the related work of the present paper. The procedure studied
+ in this prior work are also based on a combination of Adam and
+ differential privacy techniques. The structure of the theoretical part of
+ the present paper follows closely this prior work and the theoretical
+ guarantees seem almost identical. Compare in particular, Theorems 4 and
+ 5 of the present paper to Theorems 1 and 2 of the previous paper.
+
+ * I wasn't able to find in the present paper a clear explanation of how the
+ newly proposed procedure performs better than the ones in the previous
+ paper. I did see some minor differences in the technical lemmas, but the
+ theoretical guarantees for the main theorems (Theorems 4 and 5) seem
+ identical.
+
+ * relatedly, I was surprised not to find a comparison with the methods in
+ the previous paper in the experimental evaluation.
+
+
+ * finally, the paper claims that the proposed methods are superior to
+ state-of-the-art methods but the experimental section does not paint
+ a clear picture: when looking at the test error in Figure 2, it seems to
+ me that the proposed methods perform very similarly to Yogi and AdaBound.
+
+2. Writing and presentation. The paper seems to have been hastily put together
+ and would benefit for a thorough proofreading and polishing pass:
+
+ * a lot of technical details are missing or unclear, making it hard to fully
+ assess the correctness of the results or even understand what they mean.
+ A non exhaustive list:
+
+ - assumption 2 on page 2: the noisy approximation \nabla\hat f(w) and
+ \tilde g_t haven't been defined yet at this point, so it is not clear
+ what it means. Looking further down in the paper, one gets to understand
+ that this assumption maybe applies \tilde g_t computed in lines 8 and 10
+ of Algorithm 1. However, this quantities are random variable, so I don't
+ know how to interpret in which sense their norm is bounded: is it almost
+ surely? in expectation?
+
+ - the descriptions of \phi_t and \psi_t below Algorithm 1 make it sound
+ like they can be quite general with the ones used in Adam given as just
+ one example. However, the analysis in the supplementary material is only
+ done for this one example, although this is not explicitly stated in the
+ theorem statements.
+
+ - relatedly, one has to guess that all the arithmetic operations in line 13
+ of the algorithm are performed componentwise when applied to vectors.
+ While this is fairly standard in this context, it is surprising to only
+ see it mentioned inside a proof in the supplementary material.
+
+ - the methods evaluated in the experimental section are in fact based on
+ a minibatch variant of the methods introduced in the paper. Nothing is
+ said about this variant in the main paper and it is only described in the
+ supplementary material.
+
+ - Theorem 2, 4 and 6 start with "let \tilde g_1, ... \tilde g_T" but then
+ given a statement about \hat g_1, ... \hat g_T, so it is not clear which
+ of the two quantities is actually considered. Looking at the proof in the
+ supplementary material, it seems that the statement is in fact about
+ \tilde g_1, ... , \tilde g_T, but the reader would have to guess unless
+ they look at the proof details.
+
+ * many sentences in the informal exposition, in particular in the
+ introduction and introductory paragraphs of the sections seem to have been
+ left in an intermediate state between two variants, probably as a result
+ of hasty editing.