summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2012-11-06 02:53:42 +0100
committerThibaut Horel <thibaut.horel@gmail.com>2012-11-06 02:53:42 +0100
commita90b10d4653eb81c2647fa1b267dadc0a9fcacd8 (patch)
treeaade432d37b7e0fdd68c5ef1674f32e570f702e9
parent2f6accb332cfa17521490fdb239327c91ed1dcd4 (diff)
downloadrecommendation-a90b10d4653eb81c2647fa1b267dadc0a9fcacd8.tar.gz
Post submission clean-up
-rw-r--r--notes.tex496
-rw-r--r--paper.tex23
-rw-r--r--papers/budget_feasible_mechanisms/BudgetFeasibleMechanisms.pdf (renamed from papers/BudgetFeasibleMechanisms.pdf)bin231480 -> 231480 bytes
-rw-r--r--papers/budget_feasible_mechanisms/HowToWinFriendsAndInfluencePeople.pdf (renamed from papers/HowToWinFriendsAndInfluencePeople.pdf)bin261742 -> 261742 bytes
-rw-r--r--papers/budget_feasible_mechanisms/SODA11_054_chenn.pdf (renamed from papers/SODA11_054_chenn.pdf)bin977285 -> 977285 bytes
-rw-r--r--proof.tex571
-rw-r--r--stoc_paper.pdfbin0 -> 328249 bytes
-rw-r--r--stoc_paper.tex49
8 files changed, 56 insertions, 1083 deletions
diff --git a/notes.tex b/notes.tex
deleted file mode 100644
index 1ce4cc4..0000000
--- a/notes.tex
+++ /dev/null
@@ -1,496 +0,0 @@
-\documentclass{IEEEtran}
-%\usepackage{mathptmx}
-\usepackage[utf8]{inputenc}
-\usepackage{amsmath,amsthm,amsfonts}
-\newtheorem{lemma}{Lemma}
-\newtheorem{fact}{Fact}
-\newtheorem{example}{Example}
-\newtheorem{prop}{Proposition}
-\newcommand{\var}{\mathop{\mathrm{Var}}}
-\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
-\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
-\newcommand{\norm}[1]{\lVert#1\rVert}
-\newcommand{\tr}[1]{#1^*}
-\newcommand{\ip}[2]{\langle #1, #2 \rangle}
-\newcommand{\mse}{\mathop{\mathrm{MSE}}}
-\DeclareMathOperator{\trace}{tr}
-\DeclareMathOperator*{\argmax}{arg\,max}
-\title{Value of data}
-\author{Stratis Ionnadis \and Thibaut Horel}
-\begin{document}
-\maketitle
-
-\section{Introduction}
-
-The goal of this work is to propose a framework to study the value of user
-data. Although it is clear that there is some notion of value attached to
-user data (for example user data can be used to generate revenue through online
-advertising, recommender systems, etc.), it is not clear which definition of
-value should be used for formal works on this notion. After having proposed
-a definition of value, we study how generic economic problem behave with
-regards to this definition. Finally we study the computational feasibility of
-these problems.
-
-\section{Data model}
-\label{sec:data-model}
-
-There is a set of users and an experimenter. Each user has a vector of public
-features (e.g. age, height, binary features, labels, etc.) and a private piece
-of information: an undisclosed variable.
-
-The experimenter wants to learn the relationship between the public features
-and the private variable. Before conducting the experiment, he has a prior
-knowledge of this relationship, called his \emph{hypothesis} and the experiment
-consists in selecting a set of users and asking them to reveal their private
-variables. Based on the observed data, the experimenter updates his hypothesis.
-
-For the experimenter, there is a notion of value attached to a group of users:
-this is how much the data teaches him about the hypothesis, how much it reduces
-his uncertainty about the hypothesis. For the users, there is a cost attached
-to revealing their data. The experimenter also has a budget constraint on the
-amount of money he can spend.
-
-The problems arising in this setup are natural: the experimenter wants to
-maximize his utility: the value of the set of users he selects, but he needs to
-compensate the users by taking into account their costs and his budget
-constraint. The users' costs can either be public, which directly leads to
-combinatorial optimizations problems, or private, in which case, a notion of
-strategy intervenes in the way the experimenter compensates the users and
-requires an auction approach.
-
-Formally, there is a set of users indexed by a set $\mathcal{I}$. The public
-feature vector of user $i\in\mathcal{I}$ is an element $x_i$ of a feature set
-$E$, his undisclosed variable is denoted by $y_i$ and belongs to some space
-$A$. The cost of user $i$ for revealing his data is a positive real number
-$c_i\in\mathbf{R}_+$. The budget of the experimenter is denoted by $B$.
-
-The prior knowledge of the experimenter takes the form of a random variable $H$
-over $A^E$ or a subset of $A^E$ called the \emph{hypothesis set}. This random
-variable expresses his uncertainty about the true hypothesis $h$. The true
-hypothesis gives the relationship between the feature vector of user $i$ and
-his private variable through the equation:
-\begin{equation}\label{eq:hypothesis-model}
- y_i = h(x_i) + \varepsilon_i
-\end{equation}
-where $\varepsilon_i$ is a random variable over $A$.
-
-\emph{TODO: explain why this model is not restrictive: $y$ can always be
- written as a deterministic function of $x$ plus something independent of
-$x$}
-
-\emph{Examples.}
-\begin{enumerate}
- \item if the hypothesis set is finite (the experimenter has a few
- deterministic model he wants to choose from), observing data allows him
- to rule out parts of the hypothesis set. In this case, the uncertainty
- of the experimenter could simply be measured by the size of the
- hypothesis set.
- \item if $A=\{0,1\}$ and the hypothesis set is the set of all binary
- functions on $E$, the learning task of the experimenter is a binary
- classification problem.
- \item if $A=\mathbf{R}$, $E=\mathbf{R}^d$ and the hypothesis set is the set
- of linear functions from $E$ to $A$: $\mathcal{L}(A,E)$, the learning
- task is a linear regression. The prior knowledge of the experimenter is
- a prior distribution on the parameters of the linear function, which is
- equivalent to regularized linear regression (e.g. ridge regression).
-\end{enumerate}
-
-\section{Economics of data}
-\label{sec:economics}
-
-The goal of this section is to further discuss the optimization problems
-mentioned in the previous section.
-
-The value function (the utility function of the experimenter) will be denoted
-by $V$ and is simply a function mapping a set of users $S\subset \mathcal{I}$
-to $V(S)\in\mathbf{R}_+$. The choice of a specific value function will be
-discussed in Section~\ref{sec:data-value}.
-
-\subsection{Optimal observation selection}
-
-In this problem, the costs $(c_i)_{i\in \mathcal{I}}$ of the users are public.
-When selecting the set $S\subset\mathcal{I}$ of users, the experimenter has to
-pay exactly $\sum_{i\in S} c_i$. Hence, the optimization problem consist in
-selecting $S^*$ defined by:
-\begin{equation}\label{eq:observation-selection}
- S^* = \argmax_{S\subset\mathcal{I}}\Big\{V(S)\;\Big|\;
- \sum_{i\in S} c_i \leq B\Big\}
-\end{equation}
-
-This is a function maximization problem under a knapsack constraint. If $V$ can
-be any function, then this problem is obviously NP-hard. A common assumption to
-make on $V$ is that it is submodular (see Section~\ref{sec:submodularity})
-which is the extension of the notion of convexity for set functions. However,
-maximizing a submodular function under knapsack constraint is still NP-hard.
-
-Sviridenko (2004) gave a (1-1/e) polynomial time approximation ratio for this
-problem, when the value function is non-decreasing and submodular.
-
-Note that this problem covers the case where all users have the same cost $c$.
-In this case, letting $B' = \left\lfloor B/c\right\rfloor$, the problems
-becomes a maximization problem under a size constraint:
-\begin{displaymath}
- S^* = \argmax_{S\subset\mathcal{I}}\left\{V(S)\;|\;
- | S| \leq B'\right\}
-\end{displaymath}
-for which Newhauser (1978) gave an optimal (1-1/e) polynomial approximation
-scheme, in the case of a non-decreasing submodular function.
-
-\subsection{Budget feasible auctions}
-
-Here, the cost of the users are private. Before the beginning of the
-experiment, they report a cost $c_i'$ which is not necessarily equal to their
-true cost: a user may decide to lie to receive more money, however, by doing
-so, their is a risk of not being included in the experiment.
-
-This notion of strategy involved in the way the users reveal their costs roots
-this problem in the auction and mechanism design theory. Formally, the
-experimenter wants to design an allocation function $f:
-\mathbf{R}_+^{\mathcal{I}} \rightarrow 2^{\mathcal{I}}$, which, given the
-reported costs of the users selects the set of users to be included in the
-experiment, and a payment function $p: \mathbf{R}_+^{\mathcal{I}} \rightarrow
-\mathbf{R}_+^{\mathcal{I}}$, which given the reported costs returns the vector
-of payments to allocate to each user.
-
-For notation convenience, we will assume given the costs $\{c_i,
-i\in\mathcal{I}\}$ of the users, and we will denote by $\{s_i,
-i\in\mathcal{I}\}$ the characteristic function of $f(\{c_i,
-i\in\mathcal{I}\})$, that is, $s_i = 1$ iff $i\in f(\{c_i, i\in\mathcal{I}\})$.
-The payment received by user $i$ will be denoted by $p_i$.
-
-The mechanism should satisfy the following conditions:
-\begin{itemize}
- \item \textbf{Normalized} if $s_i = 0$ then $p_i = 0$.
- \item \textbf{Individually rational} $p_i \geq s_ic_i$.
- \item \textbf{Truthful} $p_i - s_ic_i \geq p_i' - s_i'c_i$, where $p_i'$
- and $s_i'$ are the payment and allocation of user $i$ had he reported
- a cost $c_i'$ different from his true cost $c_i$ (keeping the costs
- reported by the other users the same).
- \item \textbf{Budget feasible} the payments should be within budget:
- \begin{displaymath}
- \sum_{i\in \mathcal{I}} s_ip_i \leq B
- \end{displaymath}
-\end{itemize}
-
-Yaron Singer (2010) proved a lower bound of 2 for the approximation ratio of
-a polynomial algorithm to solve this problem. He also gave a randomized general
-algorithm with an approximation ratio of 117.7, although for specific problems
-(coverage, knapsack, etc.) better ratios can be attained.
-
-State of the art: Chenn, Gravin and Lie, lower bound of $1+\sqrt{2}$ and upper
-bound of 8.34 in the fractional non-strategic optimization problem can be
-solved in polynomial time.
-
-\section{Value of data}
-\label{sec:data-value}
-
-Here, we discuss a choice for the value function appearing in the problems
-discussed in Section~\ref{sec:economics}. Such a value function should be at
-least normalized, positive and non-decreasing. It should furthermore capture
-a notion of \emph{uncertainty reduction} related to the learning task of the
-experimenter.
-
-The prior knowledge of the experimenter, the distribution of the random
-variable $H$ over the hypothesis set conveys his uncertainty about the true
-hypothesis. A common measure of uncertainty is given by the entropy. If we
-denote by $P_H$ the probability distribution of $H$, its entropy is defined
-by\footnote{Here we choose to write the entropy of a discrete distribution, but
- our results do not rely on this assumption: if the distribution is
- continuous, one simply needs to replace the entropy by the differential
-entropy.}:
-\begin{displaymath}
- \mathbb{H}(H) = -\sum_{h\in A^E}P_H(h)\log\big(P_H(h)\big)
-\end{displaymath}
-
-We will denote by $Y_i$, $i\in\mathcal{I}$ the answer of user $i$ according to
-the experimenter's knowledge:
-\begin{equation}\label{eq:data-model}
- Y_i = H(x_i) + \varepsilon_i
-\end{equation}
-and $Y_S$ will denote the set of answers from the users in a subset $S$ of
-$\mathcal{I}$.
-
-We can now define the value of a group $S$ of users as being the decrease of
-entropy induced by observing $S$:
-\begin{equation}\label{eq:value}
- \forall S\subset I,\; V(S)
- = \mathbb{H}(H) - \mathbb{H}(H\,|\,Y_S)
-\end{equation}
-where $\mathbb{H}(H\,|\,Y_S)$ is the conditional entropy of $H$ given $Y_S$.
-One can also note that the definition of the value given in \eqref{eq:value} is
-simply the mutual information $I(H;Y_S)$ between $H$ and $Y_S$. Submodularity
-is conserved by composition by a concave function on the left (see
-Section~\ref{sec:submodularity}). Hence, the definition of the value of $S$ can be
-extended to any $f\big(V(S)\big)$ where $f$ is a non-decreasing concave
-function.
-
-This notion of value, also known as the \emph{value of information} (TODO: ref)
-can be motivated by considering the information theoretic interpretation of the
-entropy: if we consider that the experimenter has access to an oracle to whom
-he can ask yes/no questions. Then, the entropy of the distribution is exactly
-the number of questions he needs to ask to fully know the hypothesis. If he
-needs to pay for each question asked to the oracle, then our definition of
-value directly relates to the cost decrease implied by observing a set of
-users.
-
-Using the \emph{information never hurts} principle, it is easy to see that the
-value function defined by \eqref{eq:value} is positive an non-decreasing (with
-regard to inclusion).
-
-Furthermore, if we add the natural assumption that
-$(\varepsilon_i)_{i\in\mathcal{I}}$ are jointly independent, which is
-equivalent to say that conditioned on the hypothesis, the
-$(Y_i)_{i\in\mathcal{I}}$ are independent, we get the following fact.
-
-\begin{fact}\label{value-submodularity}
- Assuming that $(Y_i)_{i\in\mathcal{I}}$ defined in \eqref{eq:data-model}
- are independent conditioned on $H$, then $V$ defined in \eqref{eq:value} is
- submodular.
-\end{fact}
-
-\begin{proof}
- Using the chain rule, one can rewrite the value of $S$ as:
- \begin{displaymath}
- V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\, H)
- \end{displaymath}
- We can now use the conditional independence of $Y_S$ to write the
- conditional entropy as a sum:
- \begin{displaymath}
- V(S) = \mathbb{H}(Y_S) - \sum_{s\in S}\mathbb{H}(Y_s\,|\, H)
- \end{displaymath}
-
- It is well known that the joint entropy of a set of random variable is
- submodular. Thus the last equation expresses $V$ as the sum of a submodular
- function and of an additive function. As a consequence, $V$ is submodular.
-\end{proof}
-
-\section{Value of data in the linear regression setup}
-
-In this section we will assume a linear model: the feature vectors belong to
-$\mathbf{R}^d$ and the private variables belong to $\mathbf{R}$. The private
-variable can be expressed as a linear combination of the features:
-\begin{displaymath}
- y = \beta^*x + \epsilon
-\end{displaymath}
-The noise $\epsilon$ is normally distributed, zero-mean and of variance
-$\sigma^2$. Furthermore, the noise is independent of the user.
-
-The hypothesis set of the experimenter is the set of all linear forms from
-$\mathbf{R}^d$ to $\mathbf{R}$. Because a linear form can be uniquely
-represented as the inner product by a vector, it is equivalent to say that the
-hypothesis set is $\mathbf{R}^d$ and that the experimenter's hypothesis is
-a random variable $\beta$ over $\mathbf{R}^d$.
-
-A common assumption made in linear regression is that $\beta$ is normally
-distributed, zero-mean, and its covariance matrix is denoted by $\Sigma$. This
-prior distribution conveys the idea that $\beta$ should have a small $L^2$
-norm, which means that the model should have some kind of sparsity. Indeed, it
-is easy to prove that choosing the $\hat\beta$ which maximizes the \emph{a
-posteriori} distribution given the observations under a normal prior is
-equivalent to solve the following optimization problem:
-\begin{displaymath}
- \hat\beta = \arg\min_{\beta\in\mathbf{R}^d}\|Y-X\beta\|^2 + \lambda
- \|\beta\|^2
-\end{displaymath}
-where $\lambda$ can be expressed as a function of $\Sigma$ and $\sigma^2$. This
-optimization problem is known as ridge regression, and is simply a least square
-fit of the data with a penalization on the $\beta$ which have a large $L^2$
-norm, which is consistent with the prior distribution.
-
-\begin{fact}
- Under the linear regression model, with a multivariate normal prior, the
- value of data of a set $S$ of users is given by:
- \begin{equation}\label{eq:linear-regression-value}
- V(S) = \frac{1}{2}\log\det\left(I_d
- + \frac{\Sigma}{\sigma^2}X_S^*X_S\right)
- \end{equation}
- where $X_S$ is the matrix whose rows are the line-vectors $x_s^*$ for $s$ in
- $S$.
-\end{fact}
-
-\begin{proof}
-Let us recall that the entropy of a multivariate normal variable $B$ in
-dimension $d$ and of covariance $\Sigma$ (the mean is not relevant) is given
-by:
-\begin{equation}\label{eq:multivariate-entropy}
- \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma\big)
-\end{equation}
-
-Using the chain rule as in the proof of Fact~\ref{value-submodularity} we get
-that:
-\begin{displaymath}
- V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta)
-\end{displaymath}
-
-Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal
-distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence:
-\begin{equation}\label{eq:h1}
- \mathbb{H}(Y_S\,|\,\beta)
- = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right)
-\end{equation}
-
-$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us
-compute its covariance matrix, $\Sigma_Y$:
-\begin{align*}
- \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\
- & = X_S\Sigma X_S^* + \sigma^2I_n
-\end{align*}
-Thus, we get that:
-\begin{equation}\label{eq:h2}
- \mathbb{H}(Y_S)
- = \frac{1}{2}\log\left((2\pi e)^n \det(X_S\Sigma X_S^* + \sigma^2 I_n)\right)
-\end{equation}
-
-Combining \eqref{eq:h1} and \eqref{eq:h2} we get:
-\begin{displaymath}
- V(S) = \frac{1}{2}\log\det\left(I_n+\frac{1}{\sigma^2}X_S\Sigma
- X_S^*\right)
-\end{displaymath}
-
-Finally, we can use the Sylvester's formula to get the result.
-\end{proof}
-
-\emph{Remarks.}
-\begin{enumerate}
- \item it is known that for a set of symmetric positive definite matrices,
- defining the value of a set to be the $\log\det$ of the sum of the
- matrices yields a non-decreasing, submodular value function. Noting that:
- \begin{displaymath}
- X_S^*X_S = \sum_{s\in S}x_sx_s^*
- \end{displaymath}
- it is clear that our value function is non-decreasing and submodular.
- The positivity follows from a direct application of the spectral
- theorem.
- \item the matrix which appears in the value function:
- \begin{displaymath}
- I_d + \frac{\Sigma}{\sigma^2}X_S^*X_S
- \end{displaymath}
- is also the inverse of the covariance matrix of the ridge regression
- estimator. In optimal experiment design, it is common to use the
- determinant of the inverse of the estiamator's covariance matrix as
- a mesure of the quality of the predicion. Indeed, this directly relates to
- the inverse of the volume of the confidence ellipsoid.
- \item This value function can be computed up to a fixed decimal precision in
- polynomial time.
-\end{enumerate}
-\subsubsection*{Marginal contribution}
-
-Here, we want to compute the marginal contribution of a point $x$ to a set $S$ of
-users:
-\begin{displaymath}
- \Delta_xV(S) = V(S\cup \{x\}) - V(S)
-\end{displaymath}
-
-Using that:
-\begin{displaymath}
- X_{S\cup\{x\}}^*X_{S\cup\{x\}} = X_S^*X_S + xx^*
-\end{displaymath}
-we get:
-\begin{align*}
- \Delta_x V(S) & = \frac{1}{2}\log\det\left(I_d
- + \Sigma\frac{X_S^*X_S}{\sigma^2} + \Sigma\frac{xx^*}{\sigma^2}\right)\\
- & - \frac{1}{2}\log\det\left(I_d + \Sigma\frac{X_S^*X_S}{\sigma^2}\right)\\
- & = \frac{1}{2}\log\det\left(I_d + xx^*\left(\sigma^2\Sigma^{-1} +
-X_S^*X_S\right)^{-1}\right)\\
-& = \frac{1}{2}\log\left(1 + x^*\left(\sigma^2\Sigma^{-1}
-+ X_S^*X_S\right)^{-1}x\right)
-\end{align*}
-
-\emph{Remark.} This formula shows that given a set $S$, users do not bring all
-the same contribution to the set $S$. This contribution depends on the norm of
-$x$ for the bilinear form defined by the matrix $(\sigma^2\Sigma^{-1}
-+ X_S^*X_S)^{-1}$ which reflects how well the new point $x$ \emph{aligns} with
-the already existing points.
-
-\section*{Appendix: Submodularity}
-\label{sec:submodularity}
-
-In this section, we will consider that we are given a \emph{universe} set $U$.
-A set function $f$ is a function defined on the power set of $U$, $\mathfrak{P}(U)$.
-
-A set function $f$ defined on $\mathfrak{P}(U)$ will be said \emph{increasing} if
-it is increasing with regards to inclusion, that is:
-
-\begin{displaymath}
- \forall\,S\subseteq T,\quad f(S)\leq f(T)
-\end{displaymath}
-
-A \emph{decreasing} function on $\mathfrak{P}(U)$ is defined similarly.
-
-A set function $f$ defined on $\mathfrak{P}(U)$ is said to be
-\emph{submodular} if it verifies the diminishing returns property, that is,
-the marginal increments when adding a point to a set, is a set decreasing
-function. More formally, for any point $x$ in $U$, we can define the marginal
-increment of $f$ regarding $x$, it is the set function defined as:
-\begin{displaymath}
- \Delta_x f(S) = f(S\cup\{x\}) - f(S)
-\end{displaymath}
-Then, $f$ is \emph{submodular} iff. for all $x$ in $U$, $\Delta_x f$ is a set
-decreasing function.
-
-Similarly, a \emph{supermodular} is a function whose marginal increments are set
-increasing functions.
-\begin{prop}
- Let $R:\mathbf{R}\rightarrow \mathbf{R}$ be a decreasing concave function and
- $f:\mathfrak{P}(U)\rightarrow\mathbf{R}$ be a decreasing submodular function,
- then the composed function $R\circ f$ is increasing and supermodular.
-\end{prop}
-
-\begin{proof}
- The increasingness of $R\circ f$ follows immediately from the decreasingness
- of $R$ and $f$.
-
- For the supermodularity, let $S$ and $T$ be two sets such that $S\subseteq
- T$. By decreasingness of $f$, we have:
- \begin{displaymath}
- \forall\,V,\quad f(T)\leq f(S)\quad\mathrm{and}\quad f(T\cup V)\leq f(S\cup V)
- \end{displaymath}
-
- Thus, by concavity of $R$:
- \begin{displaymath}\label{eq:base}
- \begin{split}
- \forall\,V,\quad\frac{R\big(f(S)\big)-R\big(f(S\cup V)\big)}{f(S)-f(S\cup
- V)}\\
- \leq\frac{R\big(f(T)\big)-R\big(f(T\cup V)\big)}{f(T)-f(T\cup V)}
- \end{split}
- \end{displaymath}
-
- $f$ is decreasing, so multiplying this last inequality by
- $f(S)-f(S\cup V)$ and $f(T)-f(T\cup V)$ yields:
- \begin{multline}
- \forall V,\quad\Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(T)-f(T\cup V)\big)\\
- \leq \Big(R\big(f(T)\big)-R\big(f(T\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)
- \end{multline}
-
- $f$ is submodular, so:
- \begin{displaymath}
- f(T\cup V)-f(T)\leq f(S\cup V) - f(S)
- \end{displaymath}
-
- $R\circ f$ is increasing, so:
- \begin{displaymath}
- R\big(f(S)\big)-R\big(f(S\cup V)\big)\leq 0
- \end{displaymath}
-
- By combining the two previous inequalities, we get:
- \begin{multline*}
- \forall V,\quad\Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)\\
- \leq \Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(T)-f(T\cup V)\big)
- \end{multline*}
-
- Injecting this last inequality into \eqref{eq:base} gives:
- \begin{multline*}
- \forall V,\quad\Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)\\
- \leq \Big(R\big(f(T)\big)-R\big(f(T\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)
- \end{multline*}
-
- Dividing left and right by $f(S)-f(S\cup V)$ yields:
- \begin{displaymath}
- \forall V,\quad R\big(f(S)\big)-R\big(f(S\cup V)\big)
- \leq R\big(f(T)\big)-R\big(f(T\cup V)\big)
- \end{displaymath}
- which is exactly the supermodularity of $R\circ f$.
-\end{proof}
-
-\end{document}
-
diff --git a/paper.tex b/paper.tex
index cbb13af..86bdbdd 100644
--- a/paper.tex
+++ b/paper.tex
@@ -1,28 +1,19 @@
-\documentclass{acm_proc_article-sp}
+\documentclass{article}
\usepackage[utf8]{inputenc}
-\usepackage{amsmath,amsfonts}
+\usepackage{amsmath,amsfonts,amsthm}
\usepackage{algorithm}
\usepackage{algpseudocode,bbm,color,verbatim}
\input{definitions}
\title{Budget Feasible Mechanisms for Experimental Design}
-
+\author{
+Thibaut Horel\\Technicolor\\\texttt{thibaut.horel@ens.fr} \and
+Stratis Ioannidis\\Technicolor\\\texttt{stratis.ioannidis@technicolor.com}\and
+S. Muthukrishnan\\Rutgers University, Microsoft Research\\\texttt{muthu@cs.rutgers.edu}}
%Remove permission block empty space
\makeatletter
\let\@copyrightspace\relax
\makeatother
-\numberofauthors{3}
-\author{
-\alignauthor Thibaut Horel\\ %\titlenote{\texttt{thibaut.horel@ens.fr}}\\
- \affaddr{Technicolor}\\
- \email{thibaut.horel@ens.fr}
-\alignauthor Stratis Ioannidis\\ % \titlenote{\texttt{stratis.ioannidis@technicolor.com}}\\
- \affaddr{Technicolor}\\
- \email{stratis.ioannidis\\@technicolor.com}
-\alignauthor S. Muthukrishnan\\% \titlenote{\texttt{muthu@cs.rutgers.edu}}\\
- \affaddr{Rutgers University, Microsoft Research}\\
- \email{muthu@cs.rutgers.edu}
-}
\begin{document}
\maketitle
@@ -44,6 +35,6 @@
\bibliographystyle{abbrv}
\bibliography{notes}
\end{small}
-\appendix
+\section*{Appendix}
\input{appendix}
\end{document}
diff --git a/papers/BudgetFeasibleMechanisms.pdf b/papers/budget_feasible_mechanisms/BudgetFeasibleMechanisms.pdf
index 76f3902..76f3902 100644
--- a/papers/BudgetFeasibleMechanisms.pdf
+++ b/papers/budget_feasible_mechanisms/BudgetFeasibleMechanisms.pdf
Binary files differ
diff --git a/papers/HowToWinFriendsAndInfluencePeople.pdf b/papers/budget_feasible_mechanisms/HowToWinFriendsAndInfluencePeople.pdf
index 2d35e81..2d35e81 100644
--- a/papers/HowToWinFriendsAndInfluencePeople.pdf
+++ b/papers/budget_feasible_mechanisms/HowToWinFriendsAndInfluencePeople.pdf
Binary files differ
diff --git a/papers/SODA11_054_chenn.pdf b/papers/budget_feasible_mechanisms/SODA11_054_chenn.pdf
index 5946a60..5946a60 100644
--- a/papers/SODA11_054_chenn.pdf
+++ b/papers/budget_feasible_mechanisms/SODA11_054_chenn.pdf
Binary files differ
diff --git a/proof.tex b/proof.tex
deleted file mode 100644
index bc57ca0..0000000
--- a/proof.tex
+++ /dev/null
@@ -1,571 +0,0 @@
-\documentclass{acm_proc_article-sp}
-\usepackage[utf8]{inputenc}
-\usepackage{amsmath,amsfonts}
-\usepackage{algorithm}
-\usepackage{algpseudocode}
-\newtheorem{lemma}{Lemma}
-\newtheorem{fact}{Fact}
-\newtheorem{example}{Example}
-\newtheorem{prop}{Proposition}
-\newtheorem{theorem}{Theorem}
-\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
-\newcommand{\var}{\mathop{\mathrm{Var}}}
-\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
-\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
-\newcommand{\norm}[1]{\lVert#1\rVert}
-\newcommand{\tr}[1]{#1^*}
-\newcommand{\ip}[2]{\langle #1, #2 \rangle}
-\newcommand{\mse}{\mathop{\mathrm{MSE}}}
-\DeclareMathOperator{\trace}{tr}
-\DeclareMathOperator*{\argmax}{arg\,max}
-\begin{document}
-
-\section{Budget feasible mechanism}
-
-\subsection{Notations}
-
-\begin{itemize}
- \item If $x\in\mathbf{R}^d$, $x^*$ will denote the transpose of vector $x$.
- If $(x, y)\in\mathbf{R}^d\times\mathbf{R}^d$, $x^*y$ is the standard
- inner product of $\mathbf{R}^d$.
- \item We will often use the following order over symmetric matrices: if $A$
- and $B$ are two $d\times d$ real symmetric matrices, we write that
- $A\leq B$ iff:
- \begin{displaymath}
- \forall x\in\mathbf{R}^d,\quad x^*Ax \leq x^*Bx
- \end{displaymath}
- That is, iff $B-A$ is symmetric semi-definite positive.
- \item Let us recall this property of the ordered defined above: if $A$ and
- $B$ are two symmetric definite positive matrices such that $A\leq B$,
- then $A^{-1}\leq B^{-1}$.
-\end{itemize}
-
-\subsection{Data model}
-\begin{itemize}
- \item set of $n$ users $\mathcal{N} = \{1,\ldots, n\}$
- \item each user has a public vector of features $x_i\in\mathbf{R}^d$ and some
- undisclosed variable $y_i\in\mathbf{R}$
- \item we assume that the data has been normalized: $\forall
- i\in\mathcal{N}$, $\norm{x_i}\leq 1$
- \item \textbf{Ridge regression model:}
- \begin{itemize}
- \item $y_i = \beta^*x_i + \varepsilon_i$
- \item $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$,
- $(\varepsilon_i)_{i\in \mathcal{N}}$ are mutually independent.
- \item prior knowledge of $\beta$: $\beta\sim\mathcal{N}(0,\kappa I_d)$
- \item $\mu = \frac{\kappa}{\sigma^2}$ is the regularization factor.
- \item after the variables $y_i$ are disclosed, the experimenter
- does \emph{maximum a posteriori estimation} which is equivalent
- under Gaussian prior to solving the ridge regression
- maximisation problem:
- \begin{displaymath}
- \beta_{\text{max}} = \argmax \sum_i (y_i - \beta^*x_i)^2
- + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
- \end{displaymath}
- \end{itemize}
-\end{itemize}
-
-\subsection{Economics}
-
-Value function: following the experiment design theory, the value of
-data is the decrease of uncertainty about the model. Common measure of
-uncertainty, the entropy. Thus, the value of a set $S$ of users is:
-\begin{displaymath}
- V(S) = \mathbb{H}(\beta) - \mathbb{H}(\beta|Y_S)
-\end{displaymath}
-where $Y_S = \{y_i,\,i\in S\}$ is the set of observations.
-
-In our case (under Gaussian prior):
-\begin{align*}
- \forall S\subset\mathcal{N},\; V(S)
- & = \frac{1}{2}\log\det\left(I_d
- + \mu\sum_{i\in S} x_ix_i^*\right)\\
- & \defeq \frac{1}{2}\log\det A(S)
-\end{align*}
-
-\begin{lemma}[Marginal contribution]
- \begin{displaymath}
- \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S)
- = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right)
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
- We have:
- \begin{align*}
- V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\
- & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\
- & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i
- x_i^*\right)\\
- & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right)
- \end{align*}
- where the last equality comes from Sylvester's determinant formula.
-\end{proof}
-\begin{itemize}
- \item each user $i$ has a cost $c_i$
- \item the auctioneer has a budget constraint $B$
- \item optimisation problem:
- \begin{displaymath}
- OPT(V,\mathcal{N}, B) = \max_{S\subset\mathcal{N}} \left\{ V(S)\,|\,
- \sum_{i\in S}c_i\leq B\right\}
- \end{displaymath}
-\end{itemize}
-
-\subsection{Relaxations of the value function}
-
-We say that $R_\mathcal{N}:[0,1]^n\rightarrow\mathbf{R}$ is a relaxation of the
-value function $V$ over $\mathcal{N}$ if it coincides with $V$ at binary
-points. Formally, for any $S\subset\mathcal{N}$, let $\mathbf{1}_S$ denote the
-indicator vector of $S$. $R_\mathcal{N}$ is a relaxation of $V$ over
-$\mathcal{N}$ iff:
-\begin{displaymath}
- \forall S\subset\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S)
-\end{displaymath}
-
-We can extend the optimisation problem defined above to a relaxation by
-extending the cost function:
-\begin{displaymath}
- \forall \lambda\in[0,1]^n,\; c(\lambda)
- = \sum_{i\in\mathcal{N}}\lambda_ic_i
-\end{displaymath}
-The optimisation problem becomes:
-\begin{displaymath}
- OPT(R_\mathcal{N}, B) =
- \max_{\lambda\in[0,1]^n}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\}
-\end{displaymath}
-The relaxations we will consider here rely on defining a probability
-distribution over subsets of $\mathcal{N}$.
-
-Let $\lambda\in[0,1]^n$, let us define:
-\begin{displaymath}
- P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i
- \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i)
-\end{displaymath}
-$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select
-a subset of $\mathcal{N}$ at random by deciding independently for each point to
-include it in the set with probability $\lambda_i$ (and to exclude it with
-probability $1-\lambda_i$).
-
-We will consider two relaxations of the value function $V$ over $\mathcal{N}$:
-\begin{itemize}
- \item the \emph{multi-linear extension} of $V$:
- \begin{align*}
- F_\mathcal{N}(\lambda)
- & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\
- & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\
- & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\
- \end{align*}
- \item the \emph{concave relaxation} of $V$:
- \begin{align*}
- L_{\mathcal{N}}(\lambda)
- & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\
- & = \log\det\left(\sum_{S\subset N}
- P_\mathcal{N}^\lambda(S)A(S)\right)\\
- & = \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}}
- \lambda_ix_ix_i^*\right)\\
- & \defeq \log\det \tilde{A}(\lambda)
- \end{align*}
-\end{itemize}
-
-\begin{lemma}
- The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence
- this relaxation is well-named!}.
-\end{lemma}
-
-\begin{proof}
- This follows from the concavity of the $\log\det$ function over symmetric
- positive semi-definite matrices. More precisely, if $A$ and $B$ are two
- symmetric positive semi-definite matrices, then:
- \begin{multline*}
- \forall\alpha\in [0, 1],\; \log\det\big(\alpha A + (1-\alpha) B\big)\\
- \geq \alpha\log\det A + (1-\alpha)\log\det B
- \end{multline*}
-\end{proof}
-
-\begin{lemma}[Rounding]\label{lemma:rounding}
- For any feasible $\lambda\in[0,1]^n$, there exists a feasible
- $\bar{\lambda}\in[0,1]^n$ such that at most one of its component is
- fractional, that is, lies in $(0,1)$ and:
- \begin{displaymath}
- F_{\mathcal{N}}(\lambda)\leq F_{\mathcal{N}}(\bar{\lambda})
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
- We give a rounding procedure which given a feasible $\lambda$ with at least
- two fractional components, returns some $\lambda'$ with one less fractional
- component, feasible such that:
- \begin{displaymath}
- F_\mathcal{N}(\lambda) \leq F_\mathcal{N}(\lambda')
- \end{displaymath}
- Applying this procedure recursively yields the lemma's result.
-
- Let us consider such a feasible $\lambda$. Let $i$ and $j$ be two
- fractional components of $\lambda$ and let us define the following
- function:
- \begin{displaymath}
- F_\lambda(\varepsilon) = F(\lambda_\varepsilon)
- \quad\textrm{where} \quad
- \lambda_\varepsilon = \lambda + \varepsilon\left(e_i-\frac{c_i}{c_j}e_j\right)
- \end{displaymath}
-
- It is easy to see that if $\lambda$ is feasible, then:
- \begin{multline}\label{eq:convex-interval}
- \forall\varepsilon\in\Big[\max\Big(-\lambda_i,(\lambda_j-1)\frac{c_j}{c_i}\Big), \min\Big(1-\lambda_i, \lambda_j
- \frac{c_j}{c_i}\Big)\Big],\;\\
- \lambda_\varepsilon\;\;\textrm{is feasible}
- \end{multline}
-
- Furthermore, the function $F_\lambda$ is convex, indeed:
- \begin{align*}
- F_\lambda(\varepsilon)
- & = \mathbb{E}_{S'\sim P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
- (\lambda_i+\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i,j\})\\
- & + (\lambda_i+\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i\})\\
- & + (1-\lambda_i-\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{j\})\\
- & + (1-\lambda_i-\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S')\Big]\\
- \end{align*}
- Thus, $F_\lambda$ is a degree 2 polynomial whose dominant coefficient is:
- \begin{multline*}
- \frac{c_i}{c_j}\mathbb{E}_{S'\sim
- P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
- V(S'\cup\{i\})+V(S'\cup\{i\})\\
- -V(S'\cup\{i,j\})-V(S')\Big]
- \end{multline*}
- which is positive by submodularity of $V$. Hence, the maximum of
- $F_\lambda$ over the interval given in \eqref{eq:convex-interval} is
- attained at one of its limit, at which either the $i$-th or $j$-th component of
- $\lambda_\varepsilon$ becomes integral.
-\end{proof}
-
-\begin{lemma}\label{lemma:relaxation-ratio}
- The following inequality holds:
- \begin{displaymath}
- \forall\lambda\in[0,1]^n,\;
- \frac{\log\big(1+\mu\big)}{2\mu}
- \,L_\mathcal{N}(\lambda)\leq
- F_\mathcal{N}(\lambda)\leq L_{\mathcal{N}}(\lambda)
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
- We will prove that:
- \begin{displaymath}
- \frac{\log\big(1+\mu\big)}{2\mu}
- \end{displaymath}
- is a lower bound of the ratio $\partial_i F_\mathcal{N}(\lambda)/\partial_i
- L_\mathcal{N}(\lambda)$.
-
- This will be enough to conclude, by observing that:
- \begin{displaymath}
- \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
- \sim_{\lambda\rightarrow 0}
- \frac{\sum_{i\in \mathcal{N}}\lambda_i\partial_i F_\mathcal{N}(0)}
- {\sum_{i\in\mathcal{N}}\lambda_i\partial_i L_\mathcal{N}(0)}
- \end{displaymath}
- and that an interior critical point of the ratio
- $F_\mathcal{N}(\lambda)/L_\mathcal{N}(\lambda)$ is defined by:
- \begin{displaymath}
- \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
- = \frac{\partial_i F_\mathcal{N}(\lambda)}{\partial_i
- L_\mathcal{N}(\lambda)}
- \end{displaymath}
-
- Let us start by computing the derivatives of $F_\mathcal{N}$ and
- $L_\mathcal{N}$ with respect to
- the $i$-th component.
-
- For $F$, it suffices to look at the derivative of
- $P_\mathcal{N}^\lambda(S)$:
- \begin{displaymath}
- \partial_i P_\mathcal{N}^\lambda(S) = \left\{
- \begin{aligned}
- & P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})\;\textrm{if}\; i\in S \\
- & - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\;\textrm{if}\;
- i\in \mathcal{N}\setminus S \\
- \end{aligned}\right.
- \end{displaymath}
-
- Hence:
- \begin{multline*}
- \partial_i F_\mathcal{N} =
- \sum_{\substack{S\subset\mathcal{N}\\ i\in S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)\\
- - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
- \end{multline*}
-
- Now, using that every $S$ such that $i\in S$ can be uniquely written as
- $S'\cup\{i\}$, we can write:
- \begin{multline*}
- \partial_i F_\mathcal{N} =
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S\cup\{i\})\\
- - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
- \end{multline*}
-
- Finally, by using the expression for the marginal contribution of $i$ to
- $S$:
- \begin{displaymath}
- \partial_i F_\mathcal{N}(\lambda) =
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)
- \end{displaymath}
-
- The computation of the derivative of $L_\mathcal{N}$ uses standard matrix
- calculus and gives:
- \begin{displaymath}
- \partial_i L_\mathcal{N}(\lambda)
- = \mu x_i^* \tilde{A}(\lambda)^{-1}x_i
- \end{displaymath}
-
- Using the following inequalities:
- \begin{gather*}
- \forall S\subset\mathcal{N}\setminus\{i\},\quad
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\geq
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S\cup\{i\})\\
- \forall S\subset\mathcal{N},\quad P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
- \geq P_\mathcal{N}^\lambda(S)\\
- \forall S\subset\mathcal{N},\quad A(S)^{-1} \geq A(S\cup\{i\})^{-1}\\
- \end{gather*}
- we get:
- \begin{align*}
- \partial_i F_\mathcal{N}(\lambda)
- & \geq \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_\mathcal{N}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
- & \geq \frac{1}{2}
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_\mathcal{N}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
- &\hspace{-3.5em}+\frac{1}{2}
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_\mathcal{N}^\lambda(S\cup\{i\})
- \log\Big(1 + \mu x_i^*A(S\cup\{i\})^{-1}x_i\Big)\\
- &\geq \frac{1}{2}
- \sum_{S\subset\mathcal{N}}
- P_\mathcal{N}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
- \end{align*}
-
- Using that $A(S)\geq I_d$ we get that:
- \begin{displaymath}
- \mu x_i^*A(S)^{-1}x_i \leq \mu
- \end{displaymath}
-
- Moreover:
- \begin{displaymath}
- \forall x\leq\mu,\; \log(1+x)\geq
- \frac{\log\big(1+\mu\big)}{\mu} x
- \end{displaymath}
-
- Hence:
- \begin{displaymath}
- \partial_i F_\mathcal{N}(\lambda) \geq
- \frac{\log\big(1+\mu\big)}{2\mu}
- x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)^{-1}\bigg)x_i
- \end{displaymath}
-
- Finally, using that the inverse is a matrix convex function over symmetric
- positive definite matrices:
- \begin{align*}
- \partial_i F_\mathcal{N}(\lambda) &\geq
- \frac{\log\big(1+\mu\big)}{2\mu}
- x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\
- & \geq \frac{\log\big(1+\mu\big)}{2\mu}
- \partial_i L_\mathcal{N}(\lambda)
- \end{align*}
-\end{proof}
-
-\begin{lemma}
- Let us denote by $C_\mu$ the constant of which appears in the bound of the
- previous lemma: $C_\mu = \frac{\log(1+\mu)}{2\mu}$, then:
- \begin{displaymath}
- OPT(L_\mathcal{N}, B) \leq \frac{1}{C_\mu}\big(2 OPT(V,\mathcal{N},B)
- + \max_{i\in\mathcal{N}}V(i)\big)
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
- Let us consider a feasible point $\lambda^*\in[0,1]^n$ such that $L_\mathcal{N}(\lambda^*)
- = OPT(L_\mathcal{N}, B)$. By applying lemma~\ref{lemma:relaxation-ratio}
- and lemma~\ref{lemma:rounding} we get a feasible point $\bar{\lambda}$ with at most
- one fractional component such that:
- \begin{equation}\label{eq:e1}
- L_\mathcal{N}(\lambda^*) \leq \frac{1}{C_\mu}
- F_\mathcal{N}(\bar{\lambda})
- \end{equation}
-
- Let $\lambda_i$ denote the fractional component of $\bar{\lambda}$ and $S$
- denote the set whose indicator vector is $\bar{\lambda} - \lambda_i e_i$.
- Using the fact that $F_\mathcal{N}$ is linear with respect to the $i$-th
- component and is a relaxation of the value function, we get:
- \begin{displaymath}
- F_\mathcal{N}(\bar{\lambda}) = V(S) +\lambda_i V(S\cup\{i\})
- \end{displaymath}
-
- Using the submodularity of $V$:
- \begin{displaymath}
- F_\mathcal{N}(\bar{\lambda}) \leq 2 V(S) + V(i)
- \end{displaymath}
-
- Note that since $\bar{\lambda}$ is feasible, $S$ is also feasible and
- $V(S)\leq OPT(V,\mathcal{N}, B)$. Hence:
- \begin{equation}\label{eq:e2}
- F_\mathcal{N}(\bar{\lambda}) \leq 2 OPT(V,\mathcal{N}, B)
- + \max_{i\in\mathcal{N}} V(i)
- \end{equation}
-
- Putting \eqref{eq:e1} and \eqref{eq:e2} together gives the results.
-\end{proof}
-
-\subsection{The mechanism}
-
-\begin{algorithm}\label{mechanism}
- \caption{Budget feasible mechanism for ridge regression}
- \begin{algorithmic}[1]
- \State $i^* \gets \argmax_{j\in\mathcal{N}}V(j)$
- \State $x^* \gets \argmax_{x\in[0,1]^n} \{L_{\mathcal{N}\setminus\{i^*\}}(x)
- \,|\, c(x)\leq B\}$
- \Statex
- \If{$L(x^*) < CV(i^*)$}
- \State \textbf{return} $\{i^*\}$
- \Else
- \State $i \gets \argmax_{1\leq j\leq n}\frac{V(j)}{c_j}$
- \State $S \gets \emptyset$
- \While{$c_i\leq \frac{B}{2}\frac{V(S\cup\{i\})-V(S)}{V(S\cup\{i\})}$}
- \State $S \gets S\cup\{i\}$
- \State $i \gets \argmax_{j\in\mathcal{N}\setminus S}
- \frac{V(S\cup\{j\})-V(S)}{c_j}$
- \EndWhile
- \State \textbf{return} $S$
- \EndIf
- \end{algorithmic}
-\end{algorithm}
-
-The constant $C$ which appears in the mechanism is unspecified for now. We can
-make the analysis regardless of its value. The optimal value will be specified
-in the theorem.
-
-\begin{lemma}
-The mechanism is monotone.
-\end{lemma}
-
-\begin{proof}
- We assume by contradiction that there exists a user $i$ that has been
- selected by the mechanism and that would not be selected had he reported
- a cost $c_i'\leq c_i$ (all the other costs staying the same).
-
- If $i\neq i^*$ and $i$ has been selected, then we are in the case where
- $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
- part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the
- submodularity of $V$, we see that $i$ will satisfy the greedy selection
- rule:
- \begin{displaymath}
- i = \argmax_{j\in\mathcal{N}\setminus S} \frac{V(S\cup\{j\})
- - V(S)}{c_j}
- \end{displaymath}
- in an earlier iteration of the greedy heuristic. Let us denote by $S_i$
- (resp. $S_i'$) the set to which $i$ is added when reporting cost $c_i$
- (resp. $c_i'$). We have $S_i'\subset S_i$. Moreover:
- \begin{align*}
- c_i' & \leq c_i \leq
- \frac{B}{2}\frac{V(S_i\cup\{i\})-V(S_i)}{V(S_i\cup\{i\})}\\
- & \leq \frac{B}{2}\frac{V(S_i'\cup\{i\})-V(S_i')}{V(S_i'\cup\{i\})}
- \end{align*}
- Hence $i$ will still be included in the result set.
-
- If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$
- instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is
- computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by
- reporting a different cost.
-\end{proof}
-
-\begin{lemma}
-The mechanism is budget feasible.
-\end{lemma}
-
-\begin{lemma}
- Let us denote by $S_M$ the set returned by the mechanism. Let us also
- write:
- \begin{displaymath}
- C_{\textrm{max}} = \max\left(1+C,\frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
- C_\mu(e-1) -5e +1}\right)\right)
- \end{displaymath}
-
- Then:
- \begin{displaymath}
- OPT(V, \mathcal{N}, B) \leq
- C_\text{max}\cdot V(S_M)
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
- If the condition on line 3 of the algorithm holds, then:
- \begin{displaymath}
- V(i^*) \geq \frac{1}{C}L(x^*) \geq
- \frac{1}{C}OPT(V,\mathcal{N}\setminus\{i\}, B)
- \end{displaymath}
-
- But:
- \begin{displaymath}
- OPT(V,\mathcal{N},B) \leq OPT(V,\mathcal{N}\setminus\{i\}, B) + V(i^*)
- \end{displaymath}
-
- Hence:
- \begin{displaymath}
- V(i^*) \geq \frac{1}{C+1} OPT(V,\mathcal{N}, B)
- \end{displaymath}
-
- If the condition of the algorithm does not hold:
- \begin{align*}
- V(i^*) & \leq \frac{1}{C}L(x^*) \leq \frac{1}{C\cdot C_\mu}
- \big(2 OPT(V,\mathcal{N}, B) + V(i^*)\big)\\
- & \leq \frac{1}{C\cdot C_\mu}\left(\frac{2e}{e-1}\big(3 V(S_M)
- + 2 V(i^*)\big)
- + V(i^*)\right)
- \end{align*}
-
- Thus:
- \begin{align*}
- V(i^*) \leq \frac{6e}{C\cdot C_\mu(e-1)- 5e + 1} V(S_M)
- \end{align*}
-
- Finally, using again that:
- \begin{displaymath}
- OPT(V,\mathcal{N},B) \leq \frac{e}{e-1}\big(3 V(S_M) + 2 V(i^*)\big)
- \end{displaymath}
-
- We get:
- \begin{displaymath}
- OPT(V, \mathcal{N}, B) \leq \frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
- C_\mu(e-1) -5e +1}\right) V(S_M)
- \end{displaymath}
-\end{proof}
-
-The optimal value for $C$ is:
-\begin{displaymath}
- C^* = \arg\min C_{\textrm{max}}
-\end{displaymath}
-
-This equation has two solutions. Only one of those is such that:
-\begin{displaymath}
- C\cdot C_\mu(e-1) -5e +1 \geq 0
-\end{displaymath}
-which is needed in the proof of the previous lemma. Computing this solution,
-we can state the main result of this section.
-
-\begin{theorem}
- The mechanism is individually rational, truthful and has an approximation
- ratio of:
- \begin{multline*}
- 1 + C^* = \frac{5e-1 + C_\mu(4e-1)}{2C_\mu(e-1)}\\
- + \frac{\sqrt{C_\mu^2(1+2e)^2
- + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
- \end{multline*}
-\end{theorem}
-\end{document}
diff --git a/stoc_paper.pdf b/stoc_paper.pdf
new file mode 100644
index 0000000..9ae80aa
--- /dev/null
+++ b/stoc_paper.pdf
Binary files differ
diff --git a/stoc_paper.tex b/stoc_paper.tex
new file mode 100644
index 0000000..cbb13af
--- /dev/null
+++ b/stoc_paper.tex
@@ -0,0 +1,49 @@
+\documentclass{acm_proc_article-sp}
+\usepackage[utf8]{inputenc}
+\usepackage{amsmath,amsfonts}
+\usepackage{algorithm}
+\usepackage{algpseudocode,bbm,color,verbatim}
+\input{definitions}
+\title{Budget Feasible Mechanisms for Experimental Design}
+
+%Remove permission block empty space
+\makeatletter
+\let\@copyrightspace\relax
+\makeatother
+
+\numberofauthors{3}
+\author{
+\alignauthor Thibaut Horel\\ %\titlenote{\texttt{thibaut.horel@ens.fr}}\\
+ \affaddr{Technicolor}\\
+ \email{thibaut.horel@ens.fr}
+\alignauthor Stratis Ioannidis\\ % \titlenote{\texttt{stratis.ioannidis@technicolor.com}}\\
+ \affaddr{Technicolor}\\
+ \email{stratis.ioannidis\\@technicolor.com}
+\alignauthor S. Muthukrishnan\\% \titlenote{\texttt{muthu@cs.rutgers.edu}}\\
+ \affaddr{Rutgers University, Microsoft Research}\\
+ \email{muthu@cs.rutgers.edu}
+}
+\begin{document}
+
+\maketitle
+\begin{abstract}
+\input{abstract}
+\end{abstract}
+\section{Introduction}
+\input{intro}
+
+\section{Preliminaries}\label{sec:peel}
+\input{problem}
+\section{Mechanism for \EDP{}}
+\input{main}
+\section{Extension to Other Problems}\label{sec:ext}
+\input{general}
+%\section{Conclusion}
+%\input{conclusion}
+\begin{small}
+\bibliographystyle{abbrv}
+\bibliography{notes}
+\end{small}
+\appendix
+\input{appendix}
+\end{document}