8 files changed, 56 insertions, 1083 deletions
diff --git a/notes.tex b/notes.tex
deleted file mode 100644
index 1ce4cc4..0000000
--- a/notes.tex
+++ /dev/null
@@ -1,496 +0,0 @@
-\documentclass{IEEEtran}
-%\usepackage{mathptmx}
-\usepackage[utf8]{inputenc}
-\usepackage{amsmath,amsthm,amsfonts}
-\newtheorem{lemma}{Lemma}
-\newtheorem{fact}{Fact}
-\newtheorem{example}{Example}
-\newtheorem{prop}{Proposition}
-\newcommand{\var}{\mathop{\mathrm{Var}}}
-\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
-\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
-\newcommand{\norm}[1]{\lVert#1\rVert}
-\newcommand{\tr}[1]{#1^*}
-\newcommand{\ip}[2]{\langle #1, #2 \rangle}
-\newcommand{\mse}{\mathop{\mathrm{MSE}}}
-\DeclareMathOperator{\trace}{tr}
-\DeclareMathOperator*{\argmax}{arg\,max}
-\title{Value of data}
-\author{Stratis Ionnadis \and Thibaut Horel}
-\begin{document}
-\maketitle
-
-\section{Introduction}
-
-The goal of this work is to propose a framework to study the value of user
-data. Although it is clear that there is some notion of value attached to
-user data (for example user data can be used to generate revenue through online
-advertising, recommender systems, etc.), it is not clear which definition of
-value should be used for formal works on this notion. After having proposed
-a definition of value, we study how generic economic problem behave with
-regards to this definition. Finally we study the computational feasibility of
-these problems.
-
-\section{Data model}
-\label{sec:data-model}
-
-There is a set of users and an experimenter. Each user has a vector of public
-features (e.g. age, height, binary features, labels, etc.) and a private piece
-of information: an undisclosed variable.
-
-The experimenter wants to learn the relationship between the public features
-and the private variable. Before conducting the experiment, he has a prior
-knowledge of this relationship, called his \emph{hypothesis} and the experiment
-consists in selecting a set of users and asking them to reveal their private
-variables. Based on the observed data, the experimenter updates his hypothesis. 
-
-For the experimenter, there is a notion of value attached to a group of users:
-this is how much the data teaches him about the hypothesis, how much it reduces
-his uncertainty about the hypothesis. For the users, there is a cost attached
-to revealing their data. The experimenter also has a budget constraint on the
-amount of money he can spend.
-
-The problems arising in this setup are natural: the experimenter wants to
-maximize his utility: the value of the set of users he selects, but he needs to
-compensate the users by taking into account their costs and his budget
-constraint. The users' costs can either be public, which directly leads to
-combinatorial optimizations problems, or private, in which case, a notion of
-strategy intervenes in the way the experimenter compensates the users and
-requires an auction approach.
-
-Formally, there is a set of users indexed by a set $\mathcal{I}$. The public
-feature vector of user $i\in\mathcal{I}$ is an element $x_i$ of a feature set
-$E$, his undisclosed variable is denoted by $y_i$ and belongs to some space
-$A$. The cost of user $i$ for revealing his data is a positive real number
-$c_i\in\mathbf{R}_+$. The budget of the experimenter is denoted by $B$.
-
-The prior knowledge of the experimenter takes the form of a random variable $H$
-over $A^E$ or a subset of $A^E$ called the \emph{hypothesis set}. This random
-variable expresses his uncertainty about the true hypothesis $h$. The true
-hypothesis gives the relationship between the feature vector of user $i$ and
-his private variable through the equation:
-\begin{equation}\label{eq:hypothesis-model}
-    y_i = h(x_i) + \varepsilon_i
-\end{equation}
-where $\varepsilon_i$ is a random variable over $A$.
-
-\emph{TODO: explain why this model is not restrictive: $y$ can always be
-    written as a deterministic function of $x$ plus something independent of
-$x$}
-
-\emph{Examples.}
-\begin{enumerate}
-    \item if the hypothesis set is finite (the experimenter has a few
-        deterministic model he wants to choose from), observing data allows him
-        to rule out parts of the hypothesis set. In this case, the uncertainty
-        of the experimenter could simply be measured by the size of the
-        hypothesis set.
-    \item if $A=\{0,1\}$ and the hypothesis set is the set of all binary
-        functions on $E$, the learning task of the experimenter is a binary
-        classification problem.
-    \item if $A=\mathbf{R}$, $E=\mathbf{R}^d$ and the hypothesis set is the set
-        of linear functions from $E$ to $A$: $\mathcal{L}(A,E)$, the learning
-        task is a linear regression. The prior knowledge of the experimenter is
-        a prior distribution on the parameters of the linear function, which is
-        equivalent to regularized linear regression (e.g. ridge regression).
-\end{enumerate}
-
-\section{Economics of data}
-\label{sec:economics}
-
-The goal of this section is to further discuss the optimization problems
-mentioned in the previous section.
-
-The value function (the utility function of the experimenter) will be denoted
-by $V$ and is simply a function mapping a set of users $S\subset \mathcal{I}$
-to $V(S)\in\mathbf{R}_+$. The choice of a specific value function will be
-discussed in Section~\ref{sec:data-value}.
-
-\subsection{Optimal observation selection}
-
-In this problem, the costs $(c_i)_{i\in \mathcal{I}}$ of the users are public.
-When selecting the set $S\subset\mathcal{I}$ of users, the experimenter has to
-pay exactly $\sum_{i\in S} c_i$. Hence, the optimization problem consist in
-selecting $S^*$ defined by:
-\begin{equation}\label{eq:observation-selection}
-    S^* = \argmax_{S\subset\mathcal{I}}\Big\{V(S)\;\Big|\; 
-        \sum_{i\in S} c_i \leq B\Big\}
-\end{equation}
-
-This is a function maximization problem under a knapsack constraint. If $V$ can
-be any function, then this problem is obviously NP-hard. A common assumption to
-make on $V$ is that it is submodular (see Section~\ref{sec:submodularity})
-which is the extension of the notion of convexity for set functions. However,
-maximizing a submodular function under knapsack constraint is still NP-hard.
-
-Sviridenko (2004) gave a (1-1/e) polynomial time approximation ratio for this
-problem, when the value function is non-decreasing and submodular.
-
-Note that this problem covers the case where all users have the same cost $c$.
-In this case, letting $B' = \left\lfloor B/c\right\rfloor$, the problems
-becomes a maximization problem under a size constraint:
-\begin{displaymath}
-    S^* = \argmax_{S\subset\mathcal{I}}\left\{V(S)\;|\; 
-        | S| \leq B'\right\}
-\end{displaymath}
-for which Newhauser (1978) gave an optimal (1-1/e) polynomial approximation
-scheme, in the case of a non-decreasing submodular function.
-
-\subsection{Budget feasible auctions}
-
-Here, the cost of the users are private. Before the beginning of the
-experiment, they report a cost $c_i'$ which is not necessarily equal to their
-true cost: a user may decide to lie to receive more money, however, by doing
-so, their is a risk of not being included in the experiment.
-
-This notion of strategy involved in the way the users reveal their costs roots
-this problem in the auction and mechanism design theory. Formally, the
-experimenter wants to design an allocation function $f:
-\mathbf{R}_+^{\mathcal{I}} \rightarrow 2^{\mathcal{I}}$, which, given the
-reported costs of the users selects the set of users to be included in the
-experiment, and a payment function $p: \mathbf{R}_+^{\mathcal{I}} \rightarrow
-\mathbf{R}_+^{\mathcal{I}}$, which given the reported costs returns the vector
-of payments to allocate to each user.
-
-For notation convenience, we will assume given the costs $\{c_i,
-i\in\mathcal{I}\}$ of the users, and we will denote by $\{s_i,
-i\in\mathcal{I}\}$ the characteristic function of $f(\{c_i,
-i\in\mathcal{I}\})$, that is, $s_i = 1$ iff $i\in f(\{c_i, i\in\mathcal{I}\})$.
-The payment received by user $i$ will be denoted by $p_i$.
-
-The mechanism should satisfy the following conditions:
-\begin{itemize}
-    \item \textbf{Normalized} if $s_i = 0$ then $p_i = 0$.
-    \item \textbf{Individually rational} $p_i \geq s_ic_i$.
-    \item \textbf{Truthful} $p_i - s_ic_i \geq p_i' - s_i'c_i$, where $p_i'$
-        and $s_i'$ are the payment and allocation of user $i$ had he reported
-        a cost $c_i'$ different from his true cost $c_i$ (keeping the costs
-        reported by the other users the same).
-    \item \textbf{Budget feasible} the payments should be within budget:
-        \begin{displaymath}
-            \sum_{i\in \mathcal{I}} s_ip_i \leq B
-        \end{displaymath}
-\end{itemize}
-
-Yaron Singer (2010) proved a lower bound of 2 for the approximation ratio of
-a polynomial algorithm to solve this problem. He also gave a randomized general
-algorithm with an approximation ratio of 117.7, although for specific problems
-(coverage, knapsack, etc.) better ratios can be attained.
-
-State of the art: Chenn, Gravin and Lie, lower bound of $1+\sqrt{2}$ and upper
-bound of 8.34 in the fractional non-strategic optimization problem can be
-solved in polynomial time.
-
-\section{Value of data}
-\label{sec:data-value}
-
-Here, we discuss a choice for the value function appearing in the problems
-discussed in Section~\ref{sec:economics}. Such a value function should be at
-least normalized, positive and non-decreasing. It should furthermore capture
-a notion of \emph{uncertainty reduction} related to the learning task of the
-experimenter.
-
-The prior knowledge of the experimenter, the distribution of the random
-variable $H$ over the hypothesis set conveys his uncertainty about the true
-hypothesis. A common measure of uncertainty is given by the entropy. If we
-denote by $P_H$ the probability distribution of $H$, its entropy is defined
-by\footnote{Here we choose to write the entropy of a discrete distribution, but
-    our results do not rely on this assumption: if the distribution is
-    continuous, one simply needs to replace the entropy by the differential
-entropy.}:
-\begin{displaymath}
-    \mathbb{H}(H) = -\sum_{h\in A^E}P_H(h)\log\big(P_H(h)\big)
-\end{displaymath}
-
-We will denote by $Y_i$, $i\in\mathcal{I}$ the answer of user $i$ according to
-the experimenter's knowledge:
-\begin{equation}\label{eq:data-model}
-    Y_i = H(x_i) + \varepsilon_i
-\end{equation}
-and $Y_S$ will denote the set of answers from the users in a subset $S$ of
-$\mathcal{I}$.
-
-We can now define the value of a group $S$ of users as being the decrease of
-entropy induced by observing $S$:
-\begin{equation}\label{eq:value}
-    \forall S\subset I,\; V(S) 
-    = \mathbb{H}(H) - \mathbb{H}(H\,|\,Y_S)
-\end{equation}
-where $\mathbb{H}(H\,|\,Y_S)$ is the conditional entropy of $H$ given $Y_S$.
-One can also note that the definition of the value given in \eqref{eq:value} is
-simply the mutual information $I(H;Y_S)$ between $H$ and $Y_S$. Submodularity
-is conserved by composition by a concave function on the left (see
-Section~\ref{sec:submodularity}). Hence, the definition of the value of $S$ can be
-extended to any $f\big(V(S)\big)$ where $f$ is a non-decreasing concave
-function. 
-
-This notion of value, also known as the \emph{value of information} (TODO: ref)
-can be motivated by considering the information theoretic interpretation of the
-entropy: if we consider that the experimenter has access to an oracle to whom
-he can ask yes/no questions. Then, the entropy of the distribution is exactly
-the number of questions he needs to ask to fully know the hypothesis. If he
-needs to pay for each question asked to the oracle, then our definition of
-value directly relates to the cost decrease implied by observing a set of
-users. 
-
-Using the \emph{information never hurts} principle, it is easy to see that the
-value function defined by \eqref{eq:value} is positive an non-decreasing (with
-regard to inclusion).
-
-Furthermore, if we add the natural assumption that
-$(\varepsilon_i)_{i\in\mathcal{I}}$ are jointly independent, which is
-equivalent to say that conditioned on the hypothesis, the
-$(Y_i)_{i\in\mathcal{I}}$ are independent, we get the following fact.
-
-\begin{fact}\label{value-submodularity}
-    Assuming that $(Y_i)_{i\in\mathcal{I}}$ defined in \eqref{eq:data-model}
-    are independent conditioned on $H$, then $V$ defined in \eqref{eq:value} is
-    submodular.
-\end{fact}
-
-\begin{proof}
-    Using the chain rule, one can rewrite the value of $S$ as:
-    \begin{displaymath}
-        V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\, H)
-    \end{displaymath}
-    We can now use the conditional independence of $Y_S$ to write the
-    conditional entropy as a sum:
-    \begin{displaymath}
-        V(S) = \mathbb{H}(Y_S) - \sum_{s\in S}\mathbb{H}(Y_s\,|\, H)
-    \end{displaymath}
-
-    It is well known that the joint entropy of a set of random variable is
-    submodular. Thus the last equation expresses $V$ as the sum of a submodular
-    function and of an additive function. As a consequence, $V$ is submodular.
-\end{proof}
-
-\section{Value of data in the linear regression setup}
-
-In this section we will assume a linear model: the feature vectors belong to
-$\mathbf{R}^d$ and the private variables belong to $\mathbf{R}$. The private
-variable can be expressed as a linear combination of the features:
-\begin{displaymath}
-    y = \beta^*x + \epsilon
-\end{displaymath}
-The noise $\epsilon$ is normally distributed, zero-mean and of variance
-$\sigma^2$. Furthermore, the noise is independent of the user.
-
-The hypothesis set of the experimenter is the set of all linear forms from
-$\mathbf{R}^d$ to $\mathbf{R}$. Because a linear form can be uniquely
-represented as the inner product by a vector, it is equivalent to say that the
-hypothesis set is $\mathbf{R}^d$ and that the experimenter's hypothesis is
-a random variable $\beta$ over $\mathbf{R}^d$.
-
-A common assumption made in linear regression is that $\beta$ is normally
-distributed, zero-mean, and its covariance matrix is denoted by $\Sigma$. This
-prior distribution conveys the idea that $\beta$ should have a small $L^2$
-norm, which means that the model should have some kind of sparsity. Indeed, it
-is easy to prove that choosing the $\hat\beta$ which maximizes the \emph{a
-posteriori} distribution given the observations under a normal prior is
-equivalent to solve the following optimization problem:
-\begin{displaymath}
-    \hat\beta = \arg\min_{\beta\in\mathbf{R}^d}\|Y-X\beta\|^2 + \lambda
-    \|\beta\|^2
-\end{displaymath}
-where $\lambda$ can be expressed as a function of $\Sigma$ and $\sigma^2$. This
-optimization problem is known as ridge regression, and is simply a least square
-fit of the data with a penalization on the $\beta$ which have a large $L^2$
-norm, which is consistent with the prior distribution. 
-
-\begin{fact}
-    Under the linear regression model, with a multivariate normal prior, the
-    value of data of a set $S$ of users is given by:
-    \begin{equation}\label{eq:linear-regression-value}
-        V(S) = \frac{1}{2}\log\det\left(I_d
-        + \frac{\Sigma}{\sigma^2}X_S^*X_S\right)
-    \end{equation}
-    where $X_S$ is the matrix whose rows are the line-vectors $x_s^*$ for $s$ in
-    $S$.
-\end{fact}
-
-\begin{proof}
-Let us recall that the entropy of a multivariate normal variable $B$ in
-dimension $d$ and of covariance $\Sigma$ (the mean is not relevant) is given
-by:
-\begin{equation}\label{eq:multivariate-entropy}
-    \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma\big)
-\end{equation}
-
-Using the chain rule as in the proof of Fact~\ref{value-submodularity} we get
-that:
-\begin{displaymath}
-    V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta)
-\end{displaymath}
-
-Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal
-distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence:
-\begin{equation}\label{eq:h1}
-    \mathbb{H}(Y_S\,|\,\beta) 
-    = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right)
-\end{equation}
-
-$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us
-compute its covariance matrix, $\Sigma_Y$:
-\begin{align*}
-    \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\
-             & = X_S\Sigma X_S^* + \sigma^2I_n
-\end{align*}
-Thus, we get that:
-\begin{equation}\label{eq:h2}
-    \mathbb{H}(Y_S) 
-    = \frac{1}{2}\log\left((2\pi e)^n \det(X_S\Sigma X_S^* + \sigma^2 I_n)\right)
-\end{equation}
-
-Combining \eqref{eq:h1} and \eqref{eq:h2} we get:
-\begin{displaymath}
-    V(S) = \frac{1}{2}\log\det\left(I_n+\frac{1}{\sigma^2}X_S\Sigma
-    X_S^*\right)
-\end{displaymath}
-
-Finally, we can use the Sylvester's formula to get the result.
-\end{proof}
-
-\emph{Remarks.}
-\begin{enumerate}
-    \item it is known that for a set of symmetric positive definite matrices,
-        defining the value of a set to be the $\log\det$ of the sum of the
-        matrices yields a non-decreasing, submodular value function. Noting that:
-        \begin{displaymath}
-            X_S^*X_S = \sum_{s\in S}x_sx_s^*
-        \end{displaymath}
-        it is clear that our value function is non-decreasing and submodular.
-        The positivity follows from a direct application of the spectral
-        theorem.
-    \item the matrix which appears in the  value function:
-        \begin{displaymath}
-            I_d + \frac{\Sigma}{\sigma^2}X_S^*X_S
-        \end{displaymath}
-    is also the inverse of the covariance matrix of the ridge regression
-    estimator. In optimal experiment design, it is common to use the
-    determinant of the inverse of the estiamator's covariance matrix as
-    a mesure of the quality of the predicion. Indeed, this directly relates to
-    the inverse of the volume of the confidence ellipsoid.
-    \item This value function can be computed up to a fixed decimal precision in
-        polynomial time.
-\end{enumerate}
-\subsubsection*{Marginal contribution}
-
-Here, we want to compute the marginal contribution of a point $x$ to a set $S$ of
-users:
-\begin{displaymath}
-    \Delta_xV(S) = V(S\cup \{x\}) - V(S)
-\end{displaymath}
-
-Using that:
-\begin{displaymath}
-    X_{S\cup\{x\}}^*X_{S\cup\{x\}} = X_S^*X_S + xx^*
-\end{displaymath}
-we get:
-\begin{align*}
-    \Delta_x V(S) & = \frac{1}{2}\log\det\left(I_d 
-    + \Sigma\frac{X_S^*X_S}{\sigma^2} + \Sigma\frac{xx^*}{\sigma^2}\right)\\
-    & - \frac{1}{2}\log\det\left(I_d + \Sigma\frac{X_S^*X_S}{\sigma^2}\right)\\
-    &  = \frac{1}{2}\log\det\left(I_d + xx^*\left(\sigma^2\Sigma^{-1} +
-X_S^*X_S\right)^{-1}\right)\\
-& = \frac{1}{2}\log\left(1 + x^*\left(\sigma^2\Sigma^{-1}
-+ X_S^*X_S\right)^{-1}x\right)
-\end{align*}
-
-\emph{Remark.} This formula shows that given a set $S$, users do not bring all
-the same contribution to the set $S$. This contribution depends on the norm of
-$x$ for the bilinear form defined by the matrix $(\sigma^2\Sigma^{-1}
-+ X_S^*X_S)^{-1}$ which reflects how well the new point $x$ \emph{aligns} with
-the already existing points.
-
-\section*{Appendix: Submodularity}
-\label{sec:submodularity}
-
-In this section, we will consider that we are given a \emph{universe} set $U$.
-A set function $f$ is a function defined on the power set of $U$, $\mathfrak{P}(U)$.
-
-A set function $f$ defined on $\mathfrak{P}(U)$ will be said \emph{increasing} if
-it is increasing with regards to inclusion, that is:
-
-\begin{displaymath}
-  \forall\,S\subseteq T,\quad f(S)\leq f(T)
-\end{displaymath}
-
-A \emph{decreasing} function on $\mathfrak{P}(U)$ is defined similarly.
-
-A set function $f$ defined on $\mathfrak{P}(U)$ is said to be
-\emph{submodular} if it verifies the diminishing returns property, that is,
-the marginal increments when adding a point to a set, is a set decreasing
-function. More formally, for any point $x$ in $U$, we can define the marginal
-increment of $f$ regarding $x$, it is the set function defined as:
-\begin{displaymath}
-    \Delta_x f(S) = f(S\cup\{x\}) - f(S)
-\end{displaymath}
-Then, $f$ is \emph{submodular} iff. for all $x$ in $U$, $\Delta_x f$ is a set
-decreasing function.
-
-Similarly, a \emph{supermodular} is a function whose marginal increments are set
-increasing functions.
-\begin{prop}
-  Let $R:\mathbf{R}\rightarrow \mathbf{R}$ be a decreasing concave function and
-  $f:\mathfrak{P}(U)\rightarrow\mathbf{R}$ be a decreasing submodular function,
-  then the composed function $R\circ f$ is increasing and supermodular.
-\end{prop}
-
-\begin{proof}
-  The increasingness of $R\circ f$ follows immediately from the decreasingness
-  of $R$ and $f$.
-  
-  For the supermodularity, let $S$ and $T$ be two sets such that $S\subseteq
-  T$. By decreasingness of $f$, we have:
-  \begin{displaymath}
-    \forall\,V,\quad f(T)\leq f(S)\quad\mathrm{and}\quad f(T\cup V)\leq f(S\cup V)
-  \end{displaymath}
-  
-  Thus, by concavity of $R$:
-  \begin{displaymath}\label{eq:base}
-  \begin{split}
-    \forall\,V,\quad\frac{R\big(f(S)\big)-R\big(f(S\cup V)\big)}{f(S)-f(S\cup
-    V)}\\
-    \leq\frac{R\big(f(T)\big)-R\big(f(T\cup V)\big)}{f(T)-f(T\cup V)}
-  \end{split}
-  \end{displaymath}
-  
-  $f$ is decreasing, so multiplying this last inequality by
-  $f(S)-f(S\cup V)$ and $f(T)-f(T\cup V)$ yields:
-  \begin{multline}
-    \forall V,\quad\Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(T)-f(T\cup V)\big)\\
-    \leq \Big(R\big(f(T)\big)-R\big(f(T\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)
-  \end{multline}
-
-  $f$ is submodular, so:
-  \begin{displaymath}
-    f(T\cup V)-f(T)\leq f(S\cup V) - f(S)
-  \end{displaymath}
-  
-  $R\circ f$ is increasing, so:
-  \begin{displaymath}
-    R\big(f(S)\big)-R\big(f(S\cup V)\big)\leq 0
-  \end{displaymath}
-
-  By combining the two previous inequalities, we get:
-    \begin{multline*}
-    \forall V,\quad\Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)\\
-    \leq \Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(T)-f(T\cup V)\big)
-  \end{multline*}
-
-  Injecting this last inequality into \eqref{eq:base} gives:
-  \begin{multline*}
-    \forall V,\quad\Big(R\big(f(S)\big)-R\big(f(S\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)\\
-    \leq \Big(R\big(f(T)\big)-R\big(f(T\cup V)\big)\Big)\big(f(S)-f(S\cup V)\big)
-  \end{multline*}
-
-  Dividing left and right by $f(S)-f(S\cup V)$ yields:
-  \begin{displaymath}
-    \forall V,\quad R\big(f(S)\big)-R\big(f(S\cup V)\big)
-    \leq R\big(f(T)\big)-R\big(f(T\cup V)\big)
-  \end{displaymath}
-  which is exactly the supermodularity of $R\circ f$.
-\end{proof}
-
-\end{document}
-
diff --git a/paper.tex b/paper.tex
index cbb13af..86bdbdd 100644
--- a/paper.tex
+++ b/paper.tex
@@ -1,28 +1,19 @@
-\documentclass{acm_proc_article-sp}
+\documentclass{article}
 \usepackage[utf8]{inputenc}
-\usepackage{amsmath,amsfonts}
+\usepackage{amsmath,amsfonts,amsthm}
 \usepackage{algorithm}
 \usepackage{algpseudocode,bbm,color,verbatim}
 \input{definitions}
 \title{Budget Feasible Mechanisms for Experimental Design}
-
+\author{
+Thibaut Horel\\Technicolor\\\texttt{thibaut.horel@ens.fr} \and
+Stratis Ioannidis\\Technicolor\\\texttt{stratis.ioannidis@technicolor.com}\and
+S. Muthukrishnan\\Rutgers University, Microsoft Research\\\texttt{muthu@cs.rutgers.edu}}
 %Remove permission block empty space
 \makeatletter
 \let\@copyrightspace\relax
 \makeatother
 
-\numberofauthors{3}
-\author{
-\alignauthor Thibaut Horel\\ %\titlenote{\texttt{thibaut.horel@ens.fr}}\\
-    \affaddr{Technicolor}\\
-    \email{thibaut.horel@ens.fr}
-\alignauthor Stratis Ioannidis\\ % \titlenote{\texttt{stratis.ioannidis@technicolor.com}}\\
-    \affaddr{Technicolor}\\
-    \email{stratis.ioannidis\\@technicolor.com}
-\alignauthor S. Muthukrishnan\\% \titlenote{\texttt{muthu@cs.rutgers.edu}}\\
-    \affaddr{Rutgers University, Microsoft Research}\\
-    \email{muthu@cs.rutgers.edu}
-}
 \begin{document}
 
 \maketitle
@@ -44,6 +35,6 @@
 \bibliographystyle{abbrv}
 \bibliography{notes}
 \end{small}
-\appendix
+\section*{Appendix}
 \input{appendix}
 \end{document}
diff --git a/papers/BudgetFeasibleMechanisms.pdf b/papers/budget_feasible_mechanisms/BudgetFeasibleMechanisms.pdf
index 76f3902..76f3902 100644
--- a/papers/BudgetFeasibleMechanisms.pdf
+++ b/papers/budget_feasible_mechanisms/BudgetFeasibleMechanisms.pdf
diff --git a/papers/HowToWinFriendsAndInfluencePeople.pdf b/papers/budget_feasible_mechanisms/HowToWinFriendsAndInfluencePeople.pdf
index 2d35e81..2d35e81 100644
--- a/papers/HowToWinFriendsAndInfluencePeople.pdf
+++ b/papers/budget_feasible_mechanisms/HowToWinFriendsAndInfluencePeople.pdf
diff --git a/papers/SODA11_054_chenn.pdf b/papers/budget_feasible_mechanisms/SODA11_054_chenn.pdf
index 5946a60..5946a60 100644
--- a/papers/SODA11_054_chenn.pdf
+++ b/papers/budget_feasible_mechanisms/SODA11_054_chenn.pdf
diff --git a/proof.tex b/proof.tex
deleted file mode 100644
index bc57ca0..0000000
--- a/proof.tex
+++ /dev/null
@@ -1,571 +0,0 @@
-\documentclass{acm_proc_article-sp}
-\usepackage[utf8]{inputenc}
-\usepackage{amsmath,amsfonts}
-\usepackage{algorithm}
-\usepackage{algpseudocode}
-\newtheorem{lemma}{Lemma}
-\newtheorem{fact}{Fact}
-\newtheorem{example}{Example}
-\newtheorem{prop}{Proposition}
-\newtheorem{theorem}{Theorem}
-\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
-\newcommand{\var}{\mathop{\mathrm{Var}}}
-\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
-\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
-\newcommand{\norm}[1]{\lVert#1\rVert}
-\newcommand{\tr}[1]{#1^*}
-\newcommand{\ip}[2]{\langle #1, #2 \rangle}
-\newcommand{\mse}{\mathop{\mathrm{MSE}}}
-\DeclareMathOperator{\trace}{tr}
-\DeclareMathOperator*{\argmax}{arg\,max}
-\begin{document}
-
-\section{Budget feasible mechanism}
-
-\subsection{Notations}
-
-\begin{itemize}
-    \item If $x\in\mathbf{R}^d$, $x^*$ will denote the transpose of vector $x$.
-        If $(x, y)\in\mathbf{R}^d\times\mathbf{R}^d$, $x^*y$ is the standard
-        inner product of $\mathbf{R}^d$.
-    \item We will often use the following order over symmetric matrices: if $A$
-        and $B$ are two $d\times d$ real symmetric matrices, we write that
-        $A\leq B$ iff:
-        \begin{displaymath}
-            \forall x\in\mathbf{R}^d,\quad x^*Ax \leq x^*Bx 
-        \end{displaymath}
-        That is, iff $B-A$ is symmetric semi-definite positive.
-    \item Let us recall this property of the ordered defined above: if $A$ and
-        $B$ are two symmetric definite positive matrices such that $A\leq B$,
-        then $A^{-1}\leq B^{-1}$.
-\end{itemize}
-
-\subsection{Data model}
-\begin{itemize}
-    \item set of $n$ users $\mathcal{N} = \{1,\ldots, n\}$
-    \item each user has a public vector of features $x_i\in\mathbf{R}^d$ and some
-        undisclosed variable $y_i\in\mathbf{R}$
-    \item we assume that the data has been normalized: $\forall
-        i\in\mathcal{N}$, $\norm{x_i}\leq 1$
-    \item \textbf{Ridge regression model:}
-        \begin{itemize}
-            \item $y_i = \beta^*x_i + \varepsilon_i$
-            \item $\varepsilon_i \sim \mathcal{N}(0,\sigma^2)$, 
-                $(\varepsilon_i)_{i\in \mathcal{N}}$ are mutually independent.
-            \item prior knowledge of $\beta$: $\beta\sim\mathcal{N}(0,\kappa I_d)$
-            \item $\mu = \frac{\kappa}{\sigma^2}$ is the regularization factor.
-            \item after the variables $y_i$ are disclosed, the experimenter
-                does \emph{maximum a posteriori estimation} which is equivalent
-                under Gaussian prior to solving the ridge regression
-                maximisation problem:
-                \begin{displaymath}
-                    \beta_{\text{max}} = \argmax \sum_i (y_i - \beta^*x_i)^2
-                    + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
-                \end{displaymath}
-        \end{itemize}
-\end{itemize}
-
-\subsection{Economics}
-
-Value function: following the experiment design theory, the value of
-data is the decrease of uncertainty about the model. Common measure of
-uncertainty, the entropy. Thus, the value of a set $S$ of users is:
-\begin{displaymath}
-    V(S) = \mathbb{H}(\beta) - \mathbb{H}(\beta|Y_S)
-\end{displaymath}
-where $Y_S = \{y_i,\,i\in S\}$ is the set of observations.
-
-In our case (under Gaussian prior):
-\begin{align*}
-    \forall S\subset\mathcal{N},\; V(S)
-    & = \frac{1}{2}\log\det\left(I_d
-        + \mu\sum_{i\in S} x_ix_i^*\right)\\
-    & \defeq \frac{1}{2}\log\det A(S)
-\end{align*}
-
-\begin{lemma}[Marginal contribution]
-    \begin{displaymath}
-        \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S) 
-        = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right)
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-    We have:
-    \begin{align*}
-        V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\
-        & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\
-        & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i
-    x_i^*\right)\\
-        & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right)
-    \end{align*}
-    where the last equality comes from Sylvester's determinant formula.
-\end{proof}
-\begin{itemize}
-    \item each user $i$ has a cost $c_i$
-    \item the auctioneer has a budget constraint $B$
-    \item optimisation problem:
-        \begin{displaymath}
-            OPT(V,\mathcal{N}, B) = \max_{S\subset\mathcal{N}} \left\{ V(S)\,|\,
-            \sum_{i\in S}c_i\leq B\right\}
-        \end{displaymath}
-\end{itemize}
-
-\subsection{Relaxations of the value function}
-
-We say that $R_\mathcal{N}:[0,1]^n\rightarrow\mathbf{R}$ is a relaxation of the
-value function $V$ over $\mathcal{N}$ if it coincides with $V$ at binary
-points. Formally, for any $S\subset\mathcal{N}$, let $\mathbf{1}_S$ denote the
-indicator vector of $S$. $R_\mathcal{N}$ is a relaxation of $V$ over
-$\mathcal{N}$ iff:
-\begin{displaymath}
-    \forall S\subset\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S)
-\end{displaymath}
-
-We can extend the optimisation problem defined above to a relaxation by
-extending the cost function:
-\begin{displaymath}
-    \forall \lambda\in[0,1]^n,\; c(\lambda)
-    = \sum_{i\in\mathcal{N}}\lambda_ic_i
-\end{displaymath}
-The optimisation problem becomes:
-\begin{displaymath}
-    OPT(R_\mathcal{N}, B) =
-    \max_{\lambda\in[0,1]^n}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\}
-\end{displaymath}
-The relaxations we will consider here rely on defining a probability
-distribution over subsets of $\mathcal{N}$.
-
-Let $\lambda\in[0,1]^n$, let us define:
-\begin{displaymath}
-    P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i
-    \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i)
-\end{displaymath}
-$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select
-a subset of $\mathcal{N}$ at random by deciding independently for each point to
-include it in the set with probability $\lambda_i$ (and to exclude it with
-probability $1-\lambda_i$).
-
-We will consider two relaxations of the value function $V$ over $\mathcal{N}$:
-\begin{itemize}
-    \item the \emph{multi-linear extension} of $V$:
-        \begin{align*}
-            F_\mathcal{N}(\lambda) 
-            & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\
-            & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\
-            & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\
-        \end{align*}
-    \item the \emph{concave relaxation} of $V$:
-        \begin{align*}
-            L_{\mathcal{N}}(\lambda) 
-            & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\
-            & = \log\det\left(\sum_{S\subset N}
-                P_\mathcal{N}^\lambda(S)A(S)\right)\\
-            & = \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}}
-                \lambda_ix_ix_i^*\right)\\
-            & \defeq \log\det \tilde{A}(\lambda)
-        \end{align*}
-\end{itemize}
-
-\begin{lemma}
-    The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence
-    this relaxation is well-named!}.
-\end{lemma}
-
-\begin{proof}
-    This follows from the concavity of the $\log\det$ function over symmetric
-    positive semi-definite matrices. More precisely, if $A$ and $B$ are two
-    symmetric positive semi-definite matrices, then:
-    \begin{multline*}
-        \forall\alpha\in [0, 1],\; \log\det\big(\alpha A + (1-\alpha) B\big)\\
-        \geq \alpha\log\det A + (1-\alpha)\log\det B
-    \end{multline*}
-\end{proof}
-
-\begin{lemma}[Rounding]\label{lemma:rounding}
-    For any feasible $\lambda\in[0,1]^n$, there exists a feasible
-    $\bar{\lambda}\in[0,1]^n$ such that at most one of its component is
-    fractional, that is, lies in $(0,1)$ and:
-    \begin{displaymath}
-        F_{\mathcal{N}}(\lambda)\leq F_{\mathcal{N}}(\bar{\lambda})
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-    We give a rounding procedure which given a feasible $\lambda$ with at least
-    two fractional components, returns some $\lambda'$ with one less fractional
-    component, feasible such that:
-    \begin{displaymath}
-        F_\mathcal{N}(\lambda) \leq F_\mathcal{N}(\lambda')
-    \end{displaymath}
-    Applying this procedure recursively yields the lemma's result.
-
-    Let us consider such a feasible $\lambda$. Let $i$ and $j$ be two
-    fractional components of $\lambda$ and let us define the following
-    function:
-    \begin{displaymath}
-        F_\lambda(\varepsilon) = F(\lambda_\varepsilon)
-        \quad\textrm{where} \quad
-        \lambda_\varepsilon = \lambda + \varepsilon\left(e_i-\frac{c_i}{c_j}e_j\right)
-    \end{displaymath}
-
-    It is easy to see that if $\lambda$ is feasible, then:
-    \begin{multline}\label{eq:convex-interval}
-        \forall\varepsilon\in\Big[\max\Big(-\lambda_i,(\lambda_j-1)\frac{c_j}{c_i}\Big), \min\Big(1-\lambda_i, \lambda_j
-        \frac{c_j}{c_i}\Big)\Big],\;\\
-            \lambda_\varepsilon\;\;\textrm{is feasible}
-    \end{multline}
-
-    Furthermore, the function $F_\lambda$ is convex, indeed:
-    \begin{align*}
-        F_\lambda(\varepsilon)
-        & = \mathbb{E}_{S'\sim P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
-        (\lambda_i+\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i,j\})\\
-        & + (\lambda_i+\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i\})\\
-        & + (1-\lambda_i-\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{j\})\\
-        & + (1-\lambda_i-\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S')\Big]\\
-    \end{align*}
-    Thus, $F_\lambda$ is a degree 2 polynomial whose dominant coefficient is:
-    \begin{multline*}
-        \frac{c_i}{c_j}\mathbb{E}_{S'\sim
-        P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
-            V(S'\cup\{i\})+V(S'\cup\{i\})\\
-        -V(S'\cup\{i,j\})-V(S')\Big]
-    \end{multline*}
-    which is positive by submodularity of $V$. Hence, the maximum of
-    $F_\lambda$ over the interval given in \eqref{eq:convex-interval} is
-    attained at one of its limit, at which either the $i$-th or $j$-th component of
-    $\lambda_\varepsilon$ becomes integral.
-\end{proof}
-
-\begin{lemma}\label{lemma:relaxation-ratio}
-    The following inequality holds:
-    \begin{displaymath}
-        \forall\lambda\in[0,1]^n,\; 
-        \frac{\log\big(1+\mu\big)}{2\mu}
-        \,L_\mathcal{N}(\lambda)\leq
-        F_\mathcal{N}(\lambda)\leq L_{\mathcal{N}}(\lambda)
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
-    We will prove that:
-    \begin{displaymath}
-        \frac{\log\big(1+\mu\big)}{2\mu}
-    \end{displaymath}
-    is a lower bound of the ratio $\partial_i F_\mathcal{N}(\lambda)/\partial_i
-    L_\mathcal{N}(\lambda)$.
-
-    This will be enough to conclude, by observing that:
-    \begin{displaymath}
-        \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
-        \sim_{\lambda\rightarrow 0}
-        \frac{\sum_{i\in \mathcal{N}}\lambda_i\partial_i F_\mathcal{N}(0)}
-        {\sum_{i\in\mathcal{N}}\lambda_i\partial_i L_\mathcal{N}(0)}
-    \end{displaymath}
-    and that an interior critical point of the ratio
-    $F_\mathcal{N}(\lambda)/L_\mathcal{N}(\lambda)$ is defined by:
-    \begin{displaymath}
-        \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
-        = \frac{\partial_i F_\mathcal{N}(\lambda)}{\partial_i
-        L_\mathcal{N}(\lambda)}
-    \end{displaymath}
-
-    Let us start by computing the derivatives of $F_\mathcal{N}$ and
-    $L_\mathcal{N}$ with respect to
-    the $i$-th component.
-
-    For $F$, it suffices to look at the derivative of
-    $P_\mathcal{N}^\lambda(S)$:
-    \begin{displaymath}
-        \partial_i P_\mathcal{N}^\lambda(S) = \left\{
-            \begin{aligned}
-                & P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})\;\textrm{if}\; i\in S \\
-                & - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\;\textrm{if}\;
-                i\in \mathcal{N}\setminus S \\
-            \end{aligned}\right.
-    \end{displaymath}
-
-    Hence:
-    \begin{multline*}
-        \partial_i F_\mathcal{N} =
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)\\
-        - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
-    \end{multline*}
-
-    Now, using that every $S$ such that $i\in S$ can be uniquely written as
-    $S'\cup\{i\}$, we can write:
-    \begin{multline*}
-        \partial_i F_\mathcal{N} =
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S\cup\{i\})\\
-        - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
-    \end{multline*}
-
-    Finally, by using the expression for the marginal contribution of $i$ to
-    $S$:
-    \begin{displaymath}
-        \partial_i F_\mathcal{N}(\lambda) = 
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)
-    \end{displaymath}
-
-    The computation of the derivative of $L_\mathcal{N}$ uses standard matrix
-    calculus and gives:
-    \begin{displaymath}
-        \partial_i L_\mathcal{N}(\lambda)
-        = \mu x_i^* \tilde{A}(\lambda)^{-1}x_i
-    \end{displaymath}
-    
-    Using the following inequalities:
-    \begin{gather*}
-        \forall S\subset\mathcal{N}\setminus\{i\},\quad
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\geq
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S\cup\{i\})\\
-        \forall S\subset\mathcal{N},\quad P_{\mathcal{N}\setminus\{i\}}^\lambda(S) 
-        \geq P_\mathcal{N}^\lambda(S)\\
-        \forall S\subset\mathcal{N},\quad A(S)^{-1} \geq A(S\cup\{i\})^{-1}\\
-    \end{gather*}
-    we get:
-    \begin{align*}
-        \partial_i F_\mathcal{N}(\lambda) 
-        & \geq \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_\mathcal{N}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
-        & \geq \frac{1}{2}
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_\mathcal{N}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
-        &\hspace{-3.5em}+\frac{1}{2}
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_\mathcal{N}^\lambda(S\cup\{i\})
-        \log\Big(1 + \mu x_i^*A(S\cup\{i\})^{-1}x_i\Big)\\
-        &\geq \frac{1}{2}
-        \sum_{S\subset\mathcal{N}}
-        P_\mathcal{N}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
-    \end{align*}
-
-    Using that $A(S)\geq I_d$ we get that:
-    \begin{displaymath}
-        \mu x_i^*A(S)^{-1}x_i \leq \mu
-    \end{displaymath}
-    
-    Moreover:
-    \begin{displaymath}
-        \forall x\leq\mu,\; \log(1+x)\geq
-    \frac{\log\big(1+\mu\big)}{\mu} x
-    \end{displaymath}
-
-    Hence:
-    \begin{displaymath}
-        \partial_i F_\mathcal{N}(\lambda) \geq
-        \frac{\log\big(1+\mu\big)}{2\mu}
-        x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)^{-1}\bigg)x_i
-    \end{displaymath}
-    
-    Finally, using that the inverse is a matrix convex function over symmetric
-    positive definite matrices:
-    \begin{align*}
-        \partial_i F_\mathcal{N}(\lambda) &\geq
-        \frac{\log\big(1+\mu\big)}{2\mu}
-        x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\
-        & \geq \frac{\log\big(1+\mu\big)}{2\mu}
-        \partial_i L_\mathcal{N}(\lambda)
-    \end{align*}
-\end{proof}
-
-\begin{lemma}
-    Let us denote by $C_\mu$ the constant of which appears in the bound of the
-    previous lemma: $C_\mu = \frac{\log(1+\mu)}{2\mu}$, then:
-    \begin{displaymath}
-        OPT(L_\mathcal{N}, B) \leq \frac{1}{C_\mu}\big(2 OPT(V,\mathcal{N},B)
-        + \max_{i\in\mathcal{N}}V(i)\big)
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-    Let us consider a feasible point $\lambda^*\in[0,1]^n$ such that $L_\mathcal{N}(\lambda^*)
-    = OPT(L_\mathcal{N}, B)$. By applying lemma~\ref{lemma:relaxation-ratio}
-    and lemma~\ref{lemma:rounding} we get a feasible point $\bar{\lambda}$ with at most
-    one fractional component such that:
-    \begin{equation}\label{eq:e1}
-        L_\mathcal{N}(\lambda^*) \leq \frac{1}{C_\mu}
-        F_\mathcal{N}(\bar{\lambda})
-    \end{equation}
-
-    Let $\lambda_i$ denote the fractional component of $\bar{\lambda}$ and $S$
-    denote the set whose indicator vector is $\bar{\lambda} - \lambda_i e_i$.
-    Using the fact that $F_\mathcal{N}$ is linear with respect to the $i$-th
-    component and is a relaxation of the value function, we get:
-    \begin{displaymath}
-        F_\mathcal{N}(\bar{\lambda}) = V(S) +\lambda_i V(S\cup\{i\})
-    \end{displaymath}
-
-    Using the submodularity of $V$:
-    \begin{displaymath}
-        F_\mathcal{N}(\bar{\lambda}) \leq 2 V(S) + V(i)
-    \end{displaymath}
-
-    Note that since $\bar{\lambda}$ is feasible, $S$ is also feasible and
-    $V(S)\leq OPT(V,\mathcal{N}, B)$. Hence:
-    \begin{equation}\label{eq:e2}
-        F_\mathcal{N}(\bar{\lambda}) \leq 2 OPT(V,\mathcal{N}, B)
-        + \max_{i\in\mathcal{N}} V(i)
-    \end{equation}
-
-    Putting \eqref{eq:e1} and \eqref{eq:e2} together gives the results.
-\end{proof}
-
-\subsection{The mechanism}
-
-\begin{algorithm}\label{mechanism}
-    \caption{Budget feasible mechanism for ridge regression}
-    \begin{algorithmic}[1]
-    \State $i^* \gets \argmax_{j\in\mathcal{N}}V(j)$
-    \State $x^* \gets \argmax_{x\in[0,1]^n} \{L_{\mathcal{N}\setminus\{i^*\}}(x)
-                                    \,|\, c(x)\leq B\}$
-        \Statex
-        \If{$L(x^*) < CV(i^*)$}
-            \State \textbf{return} $\{i^*\}$
-        \Else
-            \State $i \gets \argmax_{1\leq j\leq n}\frac{V(j)}{c_j}$
-            \State $S \gets \emptyset$
-            \While{$c_i\leq \frac{B}{2}\frac{V(S\cup\{i\})-V(S)}{V(S\cup\{i\})}$}
-                \State $S \gets S\cup\{i\}$
-                \State $i \gets \argmax_{j\in\mathcal{N}\setminus S}
-                \frac{V(S\cup\{j\})-V(S)}{c_j}$
-            \EndWhile
-            \State \textbf{return} $S$
-        \EndIf
-    \end{algorithmic}
-\end{algorithm}
-
-The constant $C$ which appears in the mechanism is unspecified for now. We can
-make the analysis regardless of its value. The optimal value will be specified
-in the theorem.
-
-\begin{lemma}
-The mechanism is monotone.
-\end{lemma}
-
-\begin{proof}
-    We assume by contradiction that there exists a user $i$ that has been
-    selected by the mechanism and that would not be selected had he reported
-    a cost $c_i'\leq c_i$ (all the other costs staying the same).
-
-    If $i\neq i^*$ and $i$ has been selected, then we are in the case where
-    $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
-    part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the
-    submodularity of $V$, we see that $i$ will satisfy the greedy selection
-    rule:
-    \begin{displaymath}
-        i = \argmax_{j\in\mathcal{N}\setminus S} \frac{V(S\cup\{j\})
-        - V(S)}{c_j}
-    \end{displaymath}
-    in an earlier iteration of the greedy heuristic. Let us denote by $S_i$
-    (resp. $S_i'$) the set to which $i$ is added when reporting cost $c_i$
-    (resp. $c_i'$). We have $S_i'\subset S_i$. Moreover:
-    \begin{align*}
-        c_i' & \leq c_i \leq
-        \frac{B}{2}\frac{V(S_i\cup\{i\})-V(S_i)}{V(S_i\cup\{i\})}\\
-        & \leq \frac{B}{2}\frac{V(S_i'\cup\{i\})-V(S_i')}{V(S_i'\cup\{i\})}
-    \end{align*}
-    Hence $i$ will still be included in the result set.
-
-    If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$
-    instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is
-    computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by
-    reporting a different cost.
-\end{proof}
-
-\begin{lemma}
-The mechanism is budget feasible.
-\end{lemma}
-
-\begin{lemma}
-    Let us denote by $S_M$ the set returned by the mechanism. Let us also
-    write:
-    \begin{displaymath}
-        C_{\textrm{max}} = \max\left(1+C,\frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
-        C_\mu(e-1) -5e  +1}\right)\right)
-    \end{displaymath}
-
-    Then:
-    \begin{displaymath}
-        OPT(V, \mathcal{N}, B) \leq
-        C_\text{max}\cdot V(S_M) 
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
-    If the condition on line 3 of the algorithm holds, then:
-    \begin{displaymath}
-        V(i^*) \geq \frac{1}{C}L(x^*) \geq
-        \frac{1}{C}OPT(V,\mathcal{N}\setminus\{i\}, B)
-    \end{displaymath}
-
-    But:
-    \begin{displaymath}
-        OPT(V,\mathcal{N},B) \leq OPT(V,\mathcal{N}\setminus\{i\}, B) + V(i^*)
-    \end{displaymath}
-
-    Hence:
-    \begin{displaymath}
-        V(i^*) \geq \frac{1}{C+1} OPT(V,\mathcal{N}, B)
-    \end{displaymath}
-
-    If the condition of the algorithm does not hold:
-    \begin{align*}
-        V(i^*) & \leq \frac{1}{C}L(x^*) \leq \frac{1}{C\cdot C_\mu}
-        \big(2 OPT(V,\mathcal{N}, B) + V(i^*)\big)\\
-        & \leq \frac{1}{C\cdot C_\mu}\left(\frac{2e}{e-1}\big(3 V(S_M)
-        + 2 V(i^*)\big)
-        + V(i^*)\right)
-    \end{align*}
-    
-    Thus:
-    \begin{align*}
-        V(i^*) \leq \frac{6e}{C\cdot C_\mu(e-1)- 5e + 1} V(S_M)
-    \end{align*}
-
-    Finally, using again that:
-    \begin{displaymath}
-        OPT(V,\mathcal{N},B) \leq \frac{e}{e-1}\big(3 V(S_M) + 2 V(i^*)\big)
-    \end{displaymath}
-    
-    We get:
-    \begin{displaymath}
-        OPT(V, \mathcal{N}, B) \leq \frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
-        C_\mu(e-1) -5e  +1}\right) V(S_M)
-    \end{displaymath}
-\end{proof}
-
-The optimal value for $C$ is:
-\begin{displaymath}
-    C^* = \arg\min C_{\textrm{max}}
-\end{displaymath}
-
-This equation has two solutions. Only one of those is such that:
-\begin{displaymath}
-    C\cdot C_\mu(e-1) -5e  +1 \geq 0
-\end{displaymath}
-which is needed in the proof of the previous lemma. Computing this solution,
-we can state the main result of this section.
-
-\begin{theorem}
-    The mechanism is individually rational, truthful and has an approximation
-    ratio of:
-    \begin{multline*}
-        1 + C^* = \frac{5e-1 + C_\mu(4e-1)}{2C_\mu(e-1)}\\
-        + \frac{\sqrt{C_\mu^2(1+2e)^2
-        + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
-    \end{multline*}
-\end{theorem}
-\end{document}
diff --git a/stoc_paper.pdf b/stoc_paper.pdf
new file mode 100644
index 0000000..9ae80aa
--- /dev/null
+++ b/stoc_paper.pdf
diff --git a/stoc_paper.tex b/stoc_paper.tex
new file mode 100644
index 0000000..cbb13af
--- /dev/null
+++ b/stoc_paper.tex
@@ -0,0 +1,49 @@
+\documentclass{acm_proc_article-sp}
+\usepackage[utf8]{inputenc}
+\usepackage{amsmath,amsfonts}
+\usepackage{algorithm}
+\usepackage{algpseudocode,bbm,color,verbatim}
+\input{definitions}
+\title{Budget Feasible Mechanisms for Experimental Design}
+
+%Remove permission block empty space
+\makeatletter
+\let\@copyrightspace\relax
+\makeatother
+
+\numberofauthors{3}
+\author{
+\alignauthor Thibaut Horel\\ %\titlenote{\texttt{thibaut.horel@ens.fr}}\\
+    \affaddr{Technicolor}\\
+    \email{thibaut.horel@ens.fr}
+\alignauthor Stratis Ioannidis\\ % \titlenote{\texttt{stratis.ioannidis@technicolor.com}}\\
+    \affaddr{Technicolor}\\
+    \email{stratis.ioannidis\\@technicolor.com}
+\alignauthor S. Muthukrishnan\\% \titlenote{\texttt{muthu@cs.rutgers.edu}}\\
+    \affaddr{Rutgers University, Microsoft Research}\\
+    \email{muthu@cs.rutgers.edu}
+}
+\begin{document}
+
+\maketitle
+\begin{abstract}
+\input{abstract}
+\end{abstract}
+\section{Introduction}
+\input{intro}
+
+\section{Preliminaries}\label{sec:peel}
+\input{problem}
+\section{Mechanism for \EDP{}}
+\input{main}
+\section{Extension to Other Problems}\label{sec:ext}
+\input{general}
+%\section{Conclusion}
+%\input{conclusion}
+\begin{small}
+\bibliographystyle{abbrv}
+\bibliography{notes}
+\end{small}
+\appendix
+\input{appendix}
+\end{document}