diff options
| -rw-r--r-- | ijcai-2021-1821.txt | 57 | ||||
| -rw-r--r-- | ijcai-2021-3017.txt | 62 | ||||
| -rw-r--r-- | ijcai-2021-366.txt | 54 | ||||
| -rw-r--r-- | ijcai-2021-47.txt | 99 |
4 files changed, 272 insertions, 0 deletions
diff --git a/ijcai-2021-1821.txt b/ijcai-2021-1821.txt new file mode 100644 index 0000000..e942f3a --- /dev/null +++ b/ijcai-2021-1821.txt @@ -0,0 +1,57 @@ +This paper considers the problem of designing an efficient (welfare optimal) +combinatorial auction in the value query model. Specifically, there are +n bidders and m items, each bidder has preferences over bundles (sets) of +items, and the goal is to allocate the items to the bidders in a way which +(approximately) maximizes social welfare and design payments which incentivize +truthful report of preferences. + +Starting from the iterative MLCA mechanism of Brero et al. (2021) (in which +bidders' value functions are approximated by a neural network trained on their +answers to previous queries), the mechanism designed in the present paper goes +one step further by making use of Fourier Analysis for set functions. +Specifically, motivated by the fact the bidders' valuation are likely to be +approximately sparse in the Fourier domain (for a well-chosen transform), the +mechanism finds a Fourier-sparse approximation of the neural networks +approximating the bidder's preferences. This then requires computing the +efficient allocation for these Fourier-sparse approximations, for which the +authors develop an MIP-based algorithm. + +On the empirical side, the authors use the STATS test suite of Weiss et al +(2017) to generate synthetic instances of combinatorial auctions. After +evaluating the extent to which bidders' valuation are Fourier-sparse, they show +that their proposed mechanism compares favorably to MLCA. + +Detailed comments +================= + +The paper considers an important and difficult problem and brings a fresh and +interesting perspective by applying ideas from Fourier analysis for set +functions. The paper was clear and well-written. + +My main issue with the proposed method is a conceptual one: while I believe +that in many situations, it is reasonable to assume that bidders' valuations +are Fourier-sparse, I find that the Hybrid approach somewhat undermines this +assumption by having two competing types of approximations for bidders' +valuations: one based on Neural Networks and one based on Fourier-sparse +functions. + +While I understand that a purely Fourier-based approximation would be +prohibitively expensive in terms of number of queries made to the bidders, the +interaction between the Neural Network approximation and the Fourier-sparse one +is hard to reason about: for example, it could be that even if the true value +functions are approximately Fourier-sparse, the Neural network approximation +which is performed first obfuscates this fact to some extent and makes it +harder for the Fourier reconstruction to work. + +Overall, I would say that this hybrid approximation is not very well motivated +from a conceptual perspective, and is only justified by the fact that it yields +some improvement (although arguably fairly minor ones) when evaluated +experimentally. However it seems much harder to analyze theoretically (although +this was not a goal of the present paper) and comes at a cost in terms of +implementation. In particular, there are now more hyperparameters to tune, +mainly the splits l_1, to l_4 for the number of queries. + +In conclusion, while I believe in using more ideas from Fourier analysis in the +design of combinatorial auctions, I am not convinced that the cost of this +specific Hybrid instantiation both from a conceptual and implementation +perspective is justified by the minor improvements observed experimentally. diff --git a/ijcai-2021-3017.txt b/ijcai-2021-3017.txt new file mode 100644 index 0000000..b869aba --- /dev/null +++ b/ijcai-2021-3017.txt @@ -0,0 +1,62 @@ +This paper introduces a general framework to find a truthful, (approximately) +revenue-maximizing combinatorial auction mechanism inside a given class of +mechanisms. A typical situation where this is useful is when considering +a parametrized class of mechanisms for which the optimal value of the parameter +depends on the (unknown) distribution of bidders and thus needs to be learned. + +The framework is based on the general idea of random sampling, where we learn +the optimal mechanism on a portion of the population but apply it to another +portion. The main idea of the framework—which can be thought of as a way to +make it more robust to variability in the bidders' population—is to first draw +a set of participatory bidders and then learn on the complement set of bidders +by repeatedly drawing samples of bidders from the complement set and finding +a mechanism which maximizes revenue *on average* over the samples. The +mechanism thus identified is then applied to the set of participatory bidders. + +In addition to the introduction of this general framework, the main +contribution of the papers are: + +1. a general bound on the revenue of the mechanisms output by this method, + which depends on various parameters quantifying the complexity of the + auction class and the variability in the population of bidders. + +2. a specific instantiation of the framework to a new class of auctions + introduced by the authors, bundling-boosted auctions, for which it is + possible to quantify more finely the various parameters described in item 1. + +3. an algorithm to efficiently find the mechanism maximizing expected revenue + (as required by the framework) for sparse bundling-boosted auctions. + +Detailed comments +================= + +The problem considered in this paper is important and the paper brings a nice +unifying perspective to it by the introduction of a general framework and +analysis of its performance through learning-theoretic complexity measures. +Furthermore, the paper is very clear and nicely written (despite the sadly +restrictive 7 page limit…). + +* an important feature of the framework, is that it completely gives up on + extracting any revenue from non-participatory bidders. While I understand + that this might be a necessary feature given the generality of the scheme, + some prior work partially avoid this loss, albeit in much more restrictive + settings. Maybe more could be said as a motivation about the trade-off + between generality of the scheme and how much revenue we can hope to extract. + +* relatedly, I was a bit surprised to see that the bounds on the revenue of the + scheme were expressed with respect to the efficient allocation on the entire + population. Why use this as a benchmark, as opposed to, for example, the + optimal revenue that could be obtained on the entire population? This might + be standard in this line of work but adding a sentence of two about this + choice might be helpful. + +* a drawback of the scheme, which again might be due to its generality is that + its guarantees depend of parameters which might be hard to compute or even + bound in practice. Although the authors try to explicitate them for specific + auction classes, they still remain somewhat opaque. This is particularly true + of the partition discrepancy quantity. + +* the scheme is reminiscent of the bootstrap method in statistics where + multiple samples are drawn with replacement from the same initial sample to + account for variability in the population. Maybe this could be mentioned + somewhere to connect the paper to the broader statistics/learning literature. diff --git a/ijcai-2021-366.txt b/ijcai-2021-366.txt new file mode 100644 index 0000000..f0be2e8 --- /dev/null +++ b/ijcai-2021-366.txt @@ -0,0 +1,54 @@ +This paper proposes a general stochastic first-order method for the +optimization of a smooth non-convex function over a Riemannian manifold. + +While SGD is known to be minimax optimal in general (requiring O(eps^-4) +stochastic gradient queries to find an esp-approximate stationary point), it +has been shown that variants of SGD can achieve O(eps^-3) sample complexity +when the objective function is required to satisfy a stronger, mean-squared +smoothness assumption. Previous adaptations of these methods to the Riemannian +case were either specific to certain manifolds or required large batch gradient +updates at certain iterations, leading to prohibitive computation costs. + +The method introduced in this paper can be described as the Riemannian +counterpart of the Stochastic Recursive Momentum method of [Cutkosky et al., +2019]. The main idea is to interpolate smoothly, over the iterations of the +algorithm, between vanilla Riemannian SGD and the momentum-based RSPRG +algorithm of [Han et al., 2020; Zhou et al., 2019]. + +The main theoretical result is that with a proper decaying schedule for the +step size and interpolation parameter (controlling how much of the momentum +term to mix in to the gradient update), the proposed algorithm converges with +a nearly-optimal rate (up to logarithmic terms) and only requires one random +sample and two gradient computations per iterations. The result holds for +arbitrary Riemannian manifolds and only requires tuning two hyperparameters. + +An experimental section evaluates the proposed algorithm for three distinct +tasks: PCA (on the Grassmann manifold), ICA (on the Stiefel manifold) and +Riemannian Centroid, and shows that it consistently outperforms other +Riemannian stochastic first-order methods. + +Detailed comments +================= + +The problem considered in this paper is relevant since many tasks in data +analysis are naturally constrained to a Riemannian manifold (for example PCA or +ICA considered in the experimental section). Furthermore it is satisfying both +from a theoretical perspective and applicability perspective to have a general +method which can be analyzed for arbitrary manifolds. + +The convergence result is strong and the proofs are correct as far as I was +able to check. A big part of the derivation follows ideas similar to the +Euclidean case which I was able to check, but not being an expert in Riemannian +geometry, I could have missed a technical mistake specific to the Riemannian +case. + +The paper is well-written (as far as the 7-page limit permits…) and I found the +experimental evaluations very convincing. + +I have a minor concern about the significance of the result because of the +extra assumption of mean-squared Lipschitzness of the gradients. It was not +clear to me how restrictive this assumption is and how often it is found in +practice. In particular, I was surprised not to see any mention of it in the +experimental section: do the examples considered there satisfy the assumption? +I think it would improve the paper to say a bit more about this. + diff --git a/ijcai-2021-47.txt b/ijcai-2021-47.txt new file mode 100644 index 0000000..1c40bae --- /dev/null +++ b/ijcai-2021-47.txt @@ -0,0 +1,99 @@ +This paper introduces a new stochastic first-order method for the optimization +of a non-convex population risk. The main idea is to combine the standard Adam +procedure with techniques from the field of differential privacy to obtain +better generalization guarantees. Specifically, given an iid sample from the +population, the procedures splits it two halves, the training data and test +data, and gradients are computed only on the training data, unless they differ +significantly from the gradients computed on the test data. In cases where +there is a significant different, the “test gradients” are revealed via +a differentially private mechanism and used instead of the “training +gradients”. + +The authors study two concrete instantiations of the above procedure, based on +the Laplace and Gaussian mechanisms respectively and provide theoretical +convergence guarantees for both of them, showing that the procedures converge +to a stationary point of the population loss. Finally the procedures are +evaluated through experiments and claimed to perform better than other +sate-of-the-art methods. + +Detailed Comments +================= + +The problem studied in this paper is a central question in the field of AI +nowadays: understanding the extent to which a model trained via stochastic +first-order methods to minimize the empirical risk also minimizes the +population risk. The proposed methods are interesting and based on a natural +application of differential privacy ideas which should intuitively lead to more +robust out-of-sample performance. + +I have several concerns about the significance and novelty of the results on +the one hand, and on the quality of writing and presentation on the other hand. + +1. Novelty and significance: + + * this paper seems very closely related to the paper “Towards Better + Generalization of Adaptive Gradient Methods” published in NeurIPS'20 and + mentioned in the related work of the present paper. The procedure studied + in this prior work are also based on a combination of Adam and + differential privacy techniques. The structure of the theoretical part of + the present paper follows closely this prior work and the theoretical + guarantees seem almost identical. Compare in particular, Theorems 4 and + 5 of the present paper to Theorems 1 and 2 of the previous paper. + + * I wasn't able to find in the present paper a clear explanation of how the + newly proposed procedure performs better than the ones in the previous + paper. I did see some minor differences in the technical lemmas, but the + theoretical guarantees for the main theorems (Theorems 4 and 5) seem + identical. + + * relatedly, I was surprised not to find a comparison with the methods in + the previous paper in the experimental evaluation. + + + * finally, the paper claims that the proposed methods are superior to + state-of-the-art methods but the experimental section does not paint + a clear picture: when looking at the test error in Figure 2, it seems to + me that the proposed methods perform very similarly to Yogi and AdaBound. + +2. Writing and presentation. The paper seems to have been hastily put together + and would benefit for a thorough proofreading and polishing pass: + + * a lot of technical details are missing or unclear, making it hard to fully + assess the correctness of the results or even understand what they mean. + A non exhaustive list: + + - assumption 2 on page 2: the noisy approximation \nabla\hat f(w) and + \tilde g_t haven't been defined yet at this point, so it is not clear + what it means. Looking further down in the paper, one gets to understand + that this assumption maybe applies \tilde g_t computed in lines 8 and 10 + of Algorithm 1. However, this quantities are random variable, so I don't + know how to interpret in which sense their norm is bounded: is it almost + surely? in expectation? + + - the descriptions of \phi_t and \psi_t below Algorithm 1 make it sound + like they can be quite general with the ones used in Adam given as just + one example. However, the analysis in the supplementary material is only + done for this one example, although this is not explicitly stated in the + theorem statements. + + - relatedly, one has to guess that all the arithmetic operations in line 13 + of the algorithm are performed componentwise when applied to vectors. + While this is fairly standard in this context, it is surprising to only + see it mentioned inside a proof in the supplementary material. + + - the methods evaluated in the experimental section are in fact based on + a minibatch variant of the methods introduced in the paper. Nothing is + said about this variant in the main paper and it is only described in the + supplementary material. + + - Theorem 2, 4 and 6 start with "let \tilde g_1, ... \tilde g_T" but then + given a statement about \hat g_1, ... \hat g_T, so it is not clear which + of the two quantities is actually considered. Looking at the proof in the + supplementary material, it seems that the statement is in fact about + \tilde g_1, ... , \tilde g_T, but the reader would have to guess unless + they look at the proof details. + + * many sentences in the informal exposition, in particular in the + introduction and introductory paragraphs of the sections seem to have been + left in an intermediate state between two variants, probably as a result + of hasty editing. |
