file splitting

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-10-28 15:16:26 -0700
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-10-28 15:16:26 -0700
commit: 3fdd3c6674acfecffa3ec734f0183f1df6a239c1 (patch)
tree: b595e3a821e6e2ae679dfd725f106c89964cc752
parent: cc28557b334d2bc53e9c8243409550ea1c9f368f (diff)
download: recommendation-3fdd3c6674acfecffa3ec734f0183f1df6a239c1.tar.gz
7 files changed, 782 insertions, 774 deletions
diff --git a/conclusion.tex b/conclusion.tex
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/conclusion.tex
diff --git a/definitions.tex b/definitions.tex
new file mode 100644
index 0000000..ca6b3ba
--- /dev/null
+++ b/definitions.tex
@@ -0,0 +1,16 @@
+\newtheorem{lemma}{Lemma}
+\newtheorem{fact}{Fact}
+\newtheorem{example}{Example}
+\newtheorem{prop}{Proposition}
+\newtheorem{theorem}{Theorem}
+\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
+\newcommand{\var}{\mathop{\mathrm{Var}}}
+\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
+\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
+\newcommand{\norm}[1]{\lVert#1\rVert}
+\newcommand{\tr}[1]{#1^*}
+\newcommand{\ip}[2]{\langle #1, #2 \rangle}
+\newcommand{\mse}{\mathop{\mathrm{MSE}}}
+\DeclareMathOperator{\trace}{tr}
+\DeclareMathOperator*{\argmax}{arg\,max}
+
diff --git a/general.tex b/general.tex
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/general.tex
diff --git a/intro.tex b/intro.tex
new file mode 100644
index 0000000..649e7de
--- /dev/null
+++ b/intro.tex
@@ -0,0 +1,17 @@
+
+\begin{itemize}
+    \item already existing field of experiment design: survey-like setup, what
+    are the best points to include in your experiment? Measure of the
+    usefulness of the data: variance-reduction or entropy-reduction.
+    \item nowadays, there is also a big focus on purchasing data: paid surveys,
+    mechanical turk, etc. that add economic aspects to the problem of
+    experiment design
+    \item recent advances (Singer, Chen) in the field of budgeted mechanisms
+    \item we study ridge regression, very widely used in statistical learning,
+    and treat it as a problem of budgeted experiment design
+    \item we make the following contributions: ...
+    \item extension to a more general setup which includes a wider class of
+    machine learning problems
+\end{itemize}
+
+
diff --git a/main.tex b/main.tex
new file mode 100644
index 0000000..189156c
--- /dev/null
+++ b/main.tex
@@ -0,0 +1,550 @@
+
+All the budget feasible mechanisms studied recently (TODO ref Singer Chen)
+rely on using a greedy heuristic extending the idea of the greedy heuristic for
+the knapsack problem. In knapsack, objects are selected based on their
+\emph{value-per-cost} ratio. The quantity which plays a similar role for
+general submodular functions is the \emph{marginal-contribution-per-cost}
+ratio: let us assume that you have already selected a set of points $S$, then
+the \emph{marginal-contribution-per-cost} ratio per cost of a new point $i$ is
+defined by:
+\begin{displaymath}
+    \frac{V(S\cup\{i\}) - V(S)}{c_i}
+\end{displaymath}
+
+The greedy heuristic then simply repeatedly selects the point whose
+marginal-contribution-per-cost ratio is the highest until it reaches the budget
+limit. Mechanism considerations aside, this is known to have an unbounded
+approximation ratio. However, lemma TODO ref in Chen or Singer shows that the
+maximum between the greedy heuristic and the point with maximum value (as
+a singleton set) provides a $\frac{5e}{e-1}$ approximation ratio.
+
+Unfortunately, TODO Singer 2011 points out that taking the maximum between the
+greedy heuristic and the most valuable point is not incentive compatible.
+Singer and Chen tackle this issue similarly: instead of comparing the most
+valuable point to the greedy solution, they compare it to a solution which is
+close enough to keep a constant approximation ratio:
+\begin{itemize}
+    \item Chen suggests using $OPT(V,\mathcal{N}\setminus\{i\}, B)$.
+        Unfortunately, in the general case, this cannot be computed exactly in
+        polynomial time.
+    \item Singer uses using the optimal value of a relaxed objective function
+        which can be proven to be close to the optimal of the original
+        objective function. The function used is tailored to the specific
+        problem of coverage.
+\end{itemize}
+
+Here, we use a relaxation of the objective function which is tailored to the
+problem of ridge regression. We define:
+\begin{displaymath}
+    \forall\lambda\in[0,1]^{|\mathcal{N}|}\,\quad L_{\mathcal{N}}(\lambda) \defeq
+    \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}} \lambda_i x_i x_i^*\right)
+\end{displaymath}
+
+We can now present the mechanism we use, which has a same flavor to Chen's and
+Singer's
+
+\begin{algorithm}\label{mechanism}
+    \caption{Mechanism for ridge regression}
+    \begin{algorithmic}[1]
+    \State $i^* \gets \argmax_{j\in\mathcal{N}}V(j)$
+    \State $x^* \gets \argmax_{x\in[0,1]^n} \{L_{\mathcal{N}\setminus\{i^*\}}(x)
+                                    \,|\, c(x)\leq B\}$
+        \Statex
+        \If{$L(x^*) < CV(i^*)$}
+            \State \textbf{return} $\{i^*\}$
+        \Else
+            \State $i \gets \argmax_{1\leq j\leq n}\frac{V(j)}{c_j}$
+            \State $S \gets \emptyset$
+            \While{$c_i\leq \frac{B}{2}\frac{V(S\cup\{i\})-V(S)}{V(S\cup\{i\})}$}
+                \State $S \gets S\cup\{i\}$
+                \State $i \gets \argmax_{j\in\mathcal{N}\setminus S}
+                \frac{V(S\cup\{j\})-V(S)}{c_j}$
+            \EndWhile
+            \State \textbf{return} $S$
+        \EndIf
+    \end{algorithmic}
+\end{algorithm}
+
+Notice, that the stopping condition in the while loop is more sophisticated
+than just ensuring that the sum of the costs does not exceed the budget. This
+is because the selected users will be payed more than their costs, and this
+stopping condition ensures budget feasibility when the users are paid their
+threshold payment.
+
+We can now state the main result of this section:
+\begin{theorem}
+    The mechanism in \ref{mechanism} is truthful, individually rational, budget
+    feasible. Furthermore, choosing:
+    \begin{multline*}
+        C = C^* =  \frac{5e-1 + C_\mu(2e+1)}{2C_\mu(e-1)}\\
+        + \frac{\sqrt{C_\mu^2(1+2e)^2
+        + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
+    \end{multline*}
+    we get an approximation ratio of:
+    \begin{multline*}
+        1 + C^* = \frac{5e-1 + C_\mu(4e-1)}{2C_\mu(e-1)}\\
+        + \frac{\sqrt{C_\mu^2(1+2e)^2
+        + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
+    \end{multline*}
+    where:
+    \begin{displaymath}
+        C_\mu = \frac{\log(1+\mu)}{2\mu}
+    \end{displaymath}
+\end{theorem}
+
+The proof will consist of the claims of the theorem broken down into lemmas.
+
+Because the user strategy is parametrized by a single parameter, truthfulness
+is equivalent to monotonicity (TODO ref). The proof here is the same as in TODO
+and is given for the sake of completeness.
+
+\begin{lemma}
+The mechanism is monotone.
+\end{lemma}
+
+\begin{proof}
+    We assume by contradiction that there exists a user $i$ that has been
+    selected by the mechanism and that would not be selected had he reported
+    a cost $c_i'\leq c_i$ (all the other costs staying the same).
+
+    If $i\neq i^*$ and $i$ has been selected, then we are in the case where
+    $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
+    part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the
+    submodularity of $V$, we see that $i$ will satisfy the greedy selection
+    rule:
+    \begin{displaymath}
+        i = \argmax_{j\in\mathcal{N}\setminus S} \frac{V(S\cup\{j\})
+        - V(S)}{c_j}
+    \end{displaymath}
+    in an earlier iteration of the greedy heuristic. Let us denote by $S_i$
+    (resp. $S_i'$) the set to which $i$ is added when reporting cost $c_i$
+    (resp. $c_i'$). We have $S_i'\subset S_i$. Moreover:
+    \begin{align*}
+        c_i' & \leq c_i \leq
+        \frac{B}{2}\frac{V(S_i\cup\{i\})-V(S_i)}{V(S_i\cup\{i\})}\\
+        & \leq \frac{B}{2}\frac{V(S_i'\cup\{i\})-V(S_i')}{V(S_i'\cup\{i\})}
+    \end{align*}
+    Hence $i$ will still be included in the result set.
+
+    If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$
+    instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is
+    computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by
+    reporting a different cost.
+\end{proof}
+
+\begin{lemma}
+The mechanism is budget feasible.
+\end{lemma}
+
+The proof is the same as in Chen and is given here for the sake of
+completeness.
+\begin{proof}
+
+\end{proof}
+
+Using the characterization of the threshold payments from Singer TODO, we can
+prove individual rationality similarly to Chen TODO.
+\begin{lemma}
+The mechanism is individually rational
+\end{lemma}
+
+\begin{proof}
+
+\end{proof}
+
+The following lemma requires to have a careful look at the relaxation function
+we chose in the mechanism. The next section will be dedicated to studying this
+relaxation and will contain the proof of the following lemma:
+\begin{lemma}
+    We have:
+    \begin{displaymath}
+        OPT(L_\mathcal{N}, B) \leq \frac{1}{C_\mu}\big(2 OPT(V,\mathcal{N},B)
+        + \max_{i\in\mathcal{N}}V(i)\big)
+    \end{displaymath}
+\end{lemma}
+
+\begin{lemma}
+    Let us denote by $S_M$ the set returned by the mechanism. Let us also
+    write:
+    \begin{displaymath}
+        C_{\textrm{max}} = \max\left(1+C,\frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
+        C_\mu(e-1) -5e  +1}\right)\right)
+    \end{displaymath}
+
+    Then:
+    \begin{displaymath}
+        OPT(V, \mathcal{N}, B) \leq
+        C_\text{max}\cdot V(S_M) 
+    \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+
+    If the condition on line 3 of the algorithm holds, then:
+    \begin{displaymath}
+        V(i^*) \geq \frac{1}{C}L(x^*) \geq
+        \frac{1}{C}OPT(V,\mathcal{N}\setminus\{i\}, B)
+    \end{displaymath}
+
+    But:
+    \begin{displaymath}
+        OPT(V,\mathcal{N},B) \leq OPT(V,\mathcal{N}\setminus\{i\}, B) + V(i^*)
+    \end{displaymath}
+
+    Hence:
+    \begin{displaymath}
+        V(i^*) \geq \frac{1}{C+1} OPT(V,\mathcal{N}, B)
+    \end{displaymath}
+
+    If the condition of the algorithm does not hold:
+    \begin{align*}
+        V(i^*) & \leq \frac{1}{C}L(x^*) \leq \frac{1}{C\cdot C_\mu}
+        \big(2 OPT(V,\mathcal{N}, B) + V(i^*)\big)\\
+        & \leq \frac{1}{C\cdot C_\mu}\left(\frac{2e}{e-1}\big(3 V(S_M)
+        + 2 V(i^*)\big)
+        + V(i^*)\right)
+    \end{align*}
+    
+    Thus:
+    \begin{align*}
+        V(i^*) \leq \frac{6e}{C\cdot C_\mu(e-1)- 5e + 1} V(S_M)
+    \end{align*}
+
+    Finally, using again that:
+    \begin{displaymath}
+        OPT(V,\mathcal{N},B) \leq \frac{e}{e-1}\big(3 V(S_M) + 2 V(i^*)\big)
+    \end{displaymath}
+    
+    We get:
+    \begin{displaymath}
+        OPT(V, \mathcal{N}, B) \leq \frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
+        C_\mu(e-1) -5e  +1}\right) V(S_M)
+    \end{displaymath}
+\end{proof}
+
+The optimal value for $C$ is:
+\begin{displaymath}
+    C^* = \arg\min C_{\textrm{max}}
+\end{displaymath}
+
+This equation has two solutions. Only one of those is such that:
+\begin{displaymath}
+    C\cdot C_\mu(e-1) -5e  +1 \geq 0
+\end{displaymath}
+which is needed in the proof of the previous lemma. Computing this solution,
+we can state the main result of this section.
+
+\subsection{Relaxations of the value function}
+
+To prove lemma TODO, we will use a general method called pipage rounding,
+introduced in TODO. Two consecutive relaxations are used: the one that we are
+interested in, whose optimization can be computed efficiently, and the
+multilinear extension which presents a \emph{cross-convexity} like behavior
+which allows for rounding of fractional solution without decreasing the value
+of the objective function and thus ensures a constant approximation of the
+value function. The difficulty resides in showing that the ratio of the two
+relaxations is bounded.
+
+We say that $R_\mathcal{N}:[0,1]^n\rightarrow\mathbf{R}$ is a relaxation of the
+value function $V$ over $\mathcal{N}$ if it coincides with $V$ at binary
+points. Formally, for any $S\subset\mathcal{N}$, let $\mathbf{1}_S$ denote the
+indicator vector of $S$. $R_\mathcal{N}$ is a relaxation of $V$ over
+$\mathcal{N}$ iff:
+\begin{displaymath}
+    \forall S\subset\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S)
+\end{displaymath}
+
+We can extend the optimisation problem defined above to a relaxation by
+extending the cost function:
+\begin{displaymath}
+    \forall \lambda\in[0,1]^n,\; c(\lambda)
+    = \sum_{i\in\mathcal{N}}\lambda_ic_i
+\end{displaymath}
+The optimisation problem becomes:
+\begin{displaymath}
+    OPT(R_\mathcal{N}, B) =
+    \max_{\lambda\in[0,1]^n}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\}
+\end{displaymath}
+The relaxations we will consider here rely on defining a probability
+distribution over subsets of $\mathcal{N}$.
+
+Let $\lambda\in[0,1]^n$, let us define:
+\begin{displaymath}
+    P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i
+    \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i)
+\end{displaymath}
+$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select
+a subset of $\mathcal{N}$ at random by deciding independently for each point to
+include it in the set with probability $\lambda_i$ (and to exclude it with
+probability $1-\lambda_i$).
+
+We will consider two relaxations of the value function $V$ over $\mathcal{N}$:
+\begin{itemize}
+    \item the \emph{multi-linear extension} of $V$:
+        \begin{align*}
+            F_\mathcal{N}(\lambda) 
+            & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\
+            & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\
+            & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\
+        \end{align*}
+    \item the \emph{concave relaxation} of $V$:
+        \begin{align*}
+            L_{\mathcal{N}}(\lambda) 
+            & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\
+            & = \log\det\left(\sum_{S\subset N}
+                P_\mathcal{N}^\lambda(S)A(S)\right)\\
+            & = \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}}
+                \lambda_ix_ix_i^*\right)\\
+            & \defeq \log\det \tilde{A}(\lambda)
+        \end{align*}
+\end{itemize}
+
+\begin{lemma}
+    The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence
+    this relaxation is well-named!}.
+\end{lemma}
+
+\begin{proof}
+    This follows from the concavity of the $\log\det$ function over symmetric
+    positive semi-definite matrices. More precisely, if $A$ and $B$ are two
+    symmetric positive semi-definite matrices, then:
+    \begin{multline*}
+        \forall\alpha\in [0, 1],\; \log\det\big(\alpha A + (1-\alpha) B\big)\\
+        \geq \alpha\log\det A + (1-\alpha)\log\det B
+    \end{multline*}
+\end{proof}
+
+\begin{lemma}[Rounding]\label{lemma:rounding}
+    For any feasible $\lambda\in[0,1]^n$, there exists a feasible
+    $\bar{\lambda}\in[0,1]^n$ such that at most one of its component is
+    fractional, that is, lies in $(0,1)$ and:
+    \begin{displaymath}
+        F_{\mathcal{N}}(\lambda)\leq F_{\mathcal{N}}(\bar{\lambda})
+    \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+    We give a rounding procedure which given a feasible $\lambda$ with at least
+    two fractional components, returns some $\lambda'$ with one less fractional
+    component, feasible such that:
+    \begin{displaymath}
+        F_\mathcal{N}(\lambda) \leq F_\mathcal{N}(\lambda')
+    \end{displaymath}
+    Applying this procedure recursively yields the lemma's result.
+
+    Let us consider such a feasible $\lambda$. Let $i$ and $j$ be two
+    fractional components of $\lambda$ and let us define the following
+    function:
+    \begin{displaymath}
+        F_\lambda(\varepsilon) = F(\lambda_\varepsilon)
+        \quad\textrm{where} \quad
+        \lambda_\varepsilon = \lambda + \varepsilon\left(e_i-\frac{c_i}{c_j}e_j\right)
+    \end{displaymath}
+
+    It is easy to see that if $\lambda$ is feasible, then:
+    \begin{multline}\label{eq:convex-interval}
+        \forall\varepsilon\in\Big[\max\Big(-\lambda_i,(\lambda_j-1)\frac{c_j}{c_i}\Big), \min\Big(1-\lambda_i, \lambda_j
+        \frac{c_j}{c_i}\Big)\Big],\;\\
+            \lambda_\varepsilon\;\;\textrm{is feasible}
+    \end{multline}
+
+    Furthermore, the function $F_\lambda$ is convex, indeed:
+    \begin{align*}
+        F_\lambda(\varepsilon)
+        & = \mathbb{E}_{S'\sim P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
+        (\lambda_i+\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i,j\})\\
+        & + (\lambda_i+\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i\})\\
+        & + (1-\lambda_i-\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{j\})\\
+        & + (1-\lambda_i-\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S')\Big]\\
+    \end{align*}
+    Thus, $F_\lambda$ is a degree 2 polynomial whose dominant coefficient is:
+    \begin{multline*}
+        \frac{c_i}{c_j}\mathbb{E}_{S'\sim
+        P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
+            V(S'\cup\{i\})+V(S'\cup\{i\})\\
+        -V(S'\cup\{i,j\})-V(S')\Big]
+    \end{multline*}
+    which is positive by submodularity of $V$. Hence, the maximum of
+    $F_\lambda$ over the interval given in \eqref{eq:convex-interval} is
+    attained at one of its limit, at which either the $i$-th or $j$-th component of
+    $\lambda_\varepsilon$ becomes integral.
+\end{proof}
+
+\begin{lemma}\label{lemma:relaxation-ratio}
+    The following inequality holds:
+    \begin{displaymath}
+        \forall\lambda\in[0,1]^n,\; 
+        \frac{\log\big(1+\mu\big)}{2\mu}
+        \,L_\mathcal{N}(\lambda)\leq
+        F_\mathcal{N}(\lambda)\leq L_{\mathcal{N}}(\lambda)
+    \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+
+    We will prove that:
+    \begin{displaymath}
+        \frac{\log\big(1+\mu\big)}{2\mu}
+    \end{displaymath}
+    is a lower bound of the ratio $\partial_i F_\mathcal{N}(\lambda)/\partial_i
+    L_\mathcal{N}(\lambda)$.
+
+    This will be enough to conclude, by observing that:
+    \begin{displaymath}
+        \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
+        \sim_{\lambda\rightarrow 0}
+        \frac{\sum_{i\in \mathcal{N}}\lambda_i\partial_i F_\mathcal{N}(0)}
+        {\sum_{i\in\mathcal{N}}\lambda_i\partial_i L_\mathcal{N}(0)}
+    \end{displaymath}
+    and that an interior critical point of the ratio
+    $F_\mathcal{N}(\lambda)/L_\mathcal{N}(\lambda)$ is defined by:
+    \begin{displaymath}
+        \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
+        = \frac{\partial_i F_\mathcal{N}(\lambda)}{\partial_i
+        L_\mathcal{N}(\lambda)}
+    \end{displaymath}
+
+    Let us start by computing the derivatives of $F_\mathcal{N}$ and
+    $L_\mathcal{N}$ with respect to
+    the $i$-th component.
+
+    For $F$, it suffices to look at the derivative of
+    $P_\mathcal{N}^\lambda(S)$:
+    \begin{displaymath}
+        \partial_i P_\mathcal{N}^\lambda(S) = \left\{
+            \begin{aligned}
+                & P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})\;\textrm{if}\; i\in S \\
+                & - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\;\textrm{if}\;
+                i\in \mathcal{N}\setminus S \\
+            \end{aligned}\right.
+    \end{displaymath}
+
+    Hence:
+    \begin{multline*}
+        \partial_i F_\mathcal{N} =
+        \sum_{\substack{S\subset\mathcal{N}\\ i\in S}}
+        P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)\\
+        - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
+        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
+    \end{multline*}
+
+    Now, using that every $S$ such that $i\in S$ can be uniquely written as
+    $S'\cup\{i\}$, we can write:
+    \begin{multline*}
+        \partial_i F_\mathcal{N} =
+        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S\cup\{i\})\\
+        - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
+        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
+    \end{multline*}
+
+    Finally, by using the expression for the marginal contribution of $i$ to
+    $S$:
+    \begin{displaymath}
+        \partial_i F_\mathcal{N}(\lambda) = 
+        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
+        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)
+    \end{displaymath}
+
+    The computation of the derivative of $L_\mathcal{N}$ uses standard matrix
+    calculus and gives:
+    \begin{displaymath}
+        \partial_i L_\mathcal{N}(\lambda)
+        = \mu x_i^* \tilde{A}(\lambda)^{-1}x_i
+    \end{displaymath}
+    
+    Using the following inequalities:
+    \begin{gather*}
+        \forall S\subset\mathcal{N}\setminus\{i\},\quad
+        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\geq
+        P_{\mathcal{N}\setminus\{i\}}^\lambda(S\cup\{i\})\\
+        \forall S\subset\mathcal{N},\quad P_{\mathcal{N}\setminus\{i\}}^\lambda(S) 
+        \geq P_\mathcal{N}^\lambda(S)\\
+        \forall S\subset\mathcal{N},\quad A(S)^{-1} \geq A(S\cup\{i\})^{-1}\\
+    \end{gather*}
+    we get:
+    \begin{align*}
+        \partial_i F_\mathcal{N}(\lambda) 
+        & \geq \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+        P_\mathcal{N}^\lambda(S)
+        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
+        & \geq \frac{1}{2}
+        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+        P_\mathcal{N}^\lambda(S)
+        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
+        &\hspace{-3.5em}+\frac{1}{2}
+        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+        P_\mathcal{N}^\lambda(S\cup\{i\})
+        \log\Big(1 + \mu x_i^*A(S\cup\{i\})^{-1}x_i\Big)\\
+        &\geq \frac{1}{2}
+        \sum_{S\subset\mathcal{N}}
+        P_\mathcal{N}^\lambda(S)
+        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
+    \end{align*}
+
+    Using that $A(S)\geq I_d$ we get that:
+    \begin{displaymath}
+        \mu x_i^*A(S)^{-1}x_i \leq \mu
+    \end{displaymath}
+    
+    Moreover:
+    \begin{displaymath}
+        \forall x\leq\mu,\; \log(1+x)\geq
+    \frac{\log\big(1+\mu\big)}{\mu} x
+    \end{displaymath}
+
+    Hence:
+    \begin{displaymath}
+        \partial_i F_\mathcal{N}(\lambda) \geq
+        \frac{\log\big(1+\mu\big)}{2\mu}
+        x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)^{-1}\bigg)x_i
+    \end{displaymath}
+    
+    Finally, using that the inverse is a matrix convex function over symmetric
+    positive definite matrices:
+    \begin{align*}
+        \partial_i F_\mathcal{N}(\lambda) &\geq
+        \frac{\log\big(1+\mu\big)}{2\mu}
+        x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\
+        & \geq \frac{\log\big(1+\mu\big)}{2\mu}
+        \partial_i L_\mathcal{N}(\lambda)
+    \end{align*}
+\end{proof}
+
+We can now prove lemma TODO from previous section.
+
+\begin{proof}
+    Let us consider a feasible point $\lambda^*\in[0,1]^n$ such that $L_\mathcal{N}(\lambda^*)
+    = OPT(L_\mathcal{N}, B)$. By applying lemma~\ref{lemma:relaxation-ratio}
+    and lemma~\ref{lemma:rounding} we get a feasible point $\bar{\lambda}$ with at most
+    one fractional component such that:
+    \begin{equation}\label{eq:e1}
+        L_\mathcal{N}(\lambda^*) \leq \frac{1}{C_\mu}
+        F_\mathcal{N}(\bar{\lambda})
+    \end{equation}
+
+    Let $\lambda_i$ denote the fractional component of $\bar{\lambda}$ and $S$
+    denote the set whose indicator vector is $\bar{\lambda} - \lambda_i e_i$.
+    Using the fact that $F_\mathcal{N}$ is linear with respect to the $i$-th
+    component and is a relaxation of the value function, we get:
+    \begin{displaymath}
+        F_\mathcal{N}(\bar{\lambda}) = V(S) +\lambda_i V(S\cup\{i\})
+    \end{displaymath}
+
+    Using the submodularity of $V$:
+    \begin{displaymath}
+        F_\mathcal{N}(\bar{\lambda}) \leq 2 V(S) + V(i)
+    \end{displaymath}
+
+    Note that since $\bar{\lambda}$ is feasible, $S$ is also feasible and
+    $V(S)\leq OPT(V,\mathcal{N}, B)$. Hence:
+    \begin{equation}\label{eq:e2}
+        F_\mathcal{N}(\bar{\lambda}) \leq 2 OPT(V,\mathcal{N}, B)
+        + \max_{i\in\mathcal{N}} V(i)
+    \end{equation}
+
+    Putting \eqref{eq:e1} and \eqref{eq:e2} together gives the results.
+\end{proof}
+
+
diff --git a/paper.tex b/paper.tex
index 1484c3d..fd02b24 100644
--- a/paper.tex
+++ b/paper.tex
@@ -3,789 +3,21 @@
 \usepackage{amsmath,amsfonts}
 \usepackage{algorithm}
 \usepackage{algpseudocode}
-\newtheorem{lemma}{Lemma}
-\newtheorem{fact}{Fact}
-\newtheorem{example}{Example}
-\newtheorem{prop}{Proposition}
-\newtheorem{theorem}{Theorem}
-\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
-\newcommand{\var}{\mathop{\mathrm{Var}}}
-\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
-\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
-\newcommand{\norm}[1]{\lVert#1\rVert}
-\newcommand{\tr}[1]{#1^*}
-\newcommand{\ip}[2]{\langle #1, #2 \rangle}
-\newcommand{\mse}{\mathop{\mathrm{MSE}}}
-\DeclareMathOperator{\trace}{tr}
-\DeclareMathOperator*{\argmax}{arg\,max}
+\input{definitions}
 \title{Budgeted Auctions for Experiment Design}
 \begin{document}
 
 \maketitle
 
 \section{Intro}
-
-\begin{itemize}
-    \item already existing field of experiment design: survey-like setup, what
-    are the best points to include in your experiment? Measure of the
-    usefulness of the data: variance-reduction or entropy-reduction.
-    \item nowadays, there is also a big focus on purchasing data: paid surveys,
-    mechanical turk, etc. that add economic aspects to the problem of
-    experiment design
-    \item recent advances (Singer, Chen) in the field of budgeted mechanisms
-    \item we study ridge regression, very widely used in statistical learning,
-    and treat it as a problem of budgeted experiment design
-    \item we make the following contributions: ...
-    \item extension to a more general setup which includes a wider class of
-    machine learning problems
-\end{itemize}
+\input{intro}
 
 \section{Problem formulation}
-
-\subsection{Notations}
-
-Throughout the paper, we will make use of the following notations: if $x$ is
-a (column) vector in $\mathbf{R}^d$, $x^*$ denotes its transposed (line)
-vector. Thus, the standard inner product between two vectors $x$ and $y$ is
-simply $x^* y$. $\norm{x}_2 = x^*x$ will denote the $L_2$ norm of $x$.
-
-We will also often use the following order over symmetric matrices: if $A$ and
-$B$ are two $d\times d$ and $B$ are two $d\times d$ real symmetric matrices, we
-write that $A\leq B$ iff:
-\begin{displaymath}
-    \forall x\in\mathbf{R}^d,\quad
-    x^*Ax \leq x^*Bx
-\end{displaymath}
-That is, iff $B-A$ is symmetric semi-definite positive.
-
-This order let us define the notion of a \emph{decreasing} or \emph{convex}
-matrix function similarly to their real counterparts. In particular, let us
-recall that the matrix inversion is decreasing and convex over symmetric
-definite positive matrices.
-
-\subsection{Data model}
-
-There is a set of $n$ users, $\mathcal{N} = \{1,\ldots, n\}$. Each user
-$i\in\mathcal{N}$ has a public vector of features $x_i\in\mathbf{R}^d$ and an
-undisclosed piece of information $y_i\in\mathbf{R}$. We assume that the data
-has already been normalized so that $\norm{x_i}_2\leq 1$ for all
-$i\in\mathcal{N}$.
-
-The experimenter is going to select a set of users and ask them to reveal their
-private piece of information. We are interested in a \emph{survey setup}: the
-experimenter has not seen the data yet, but he wants to know which users he
-should be selecting. His goal is to learn the model underlying the data. Here,
-we assume a linear model:
-\begin{displaymath}
-    \forall i\in\mathcal{N},\quad y_i = \beta^* x_i + \varepsilon_i
-\end{displaymath}
-where $\beta\in\mathbf{R}^d$ and $\varepsilon_i\in\mathbf{R}$ follows a normal
-distribution of mean $0$ and variance $\sigma^2$. Furthermore, we assume the
-error $\varepsilon$ to be independent of the user:
-$(\varepsilon_i)_{i\in\mathcal{N}}$ are mutually independent.
-
-After observing the data, the experimenter could simply do linear regression to
-learn the model parameter $\beta$. However, in a more general setup, the
-experimenter has a prior knowledge about $\beta$, a distribution over
-$\mathbf{R}^d$. After observing the data, the experimenter performs
-\emph{maximum a posteriori estimation}: computing the point which maximizes the
-posterior distribution of $\beta$ given the observations.
-
-Here, we will assume, as it is often done, that the prior distribution is
-a multivariate normal distribution of mean zero and covariance matrix $\kappa
-I_d$. Maximum a posteriori estimation leads to the following maximization
-problem:
-\begin{displaymath}
-    \beta_{\text{max}} = \argmax_{\beta\in\mathbf{R}^d} \sum_i (y_i - \beta^*x_i)^2
-    + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
-\end{displaymath}
-which is the well-known \emph{ridge regression}. $\mu
-= \frac{\kappa}{\sigma^2}$ is the regularization parameter. Ridge regression
-can thus be seen as linear regression with a regularization term which
-prevents $\beta$ from having a large $L_2$-norm.
-
-\subsection{Value of data}
-
-Because the user private variables $y_i$ have not been observed yet when the
-experimenter has to decide which users to include in his experiment, we treat
-$\beta$ as a random variable whose distribution is updated after observing the
-data.
-
-Let us recall that if $\beta$ is random variable over $\mathbf{R}^d$ whose
-probability distribution has a density function $f$ with respect to the
-Lebesgue measure, its entropy is given by:
-\begin{displaymath}
-    \mathbb{H}(\beta) \defeq - \int_{b\in\mathbf{R}^d} \log f(b) f(b)\text{d}b
-\end{displaymath}
-
-A usual way to measure the decrease of uncertainty induced by the observation
-of data is to use the entropy. This leads to the following definition of the
-value of data called the \emph{value of information}:
-\begin{displaymath}
-    \forall S\subset\mathcal{N},\quad V(S) = \mathbb{H}(\beta)
-    - \mathbb{H}(\beta\,|\,
-    Y_S)
-\end{displaymath}
-where $Y_S = \{y_i,\,i\in S\}$ is the set of observed data.
-
-\begin{theorem}
-    Under the ridge regression model explained in section TODO, the value of data
-    is equal to:
-    \begin{align*}
-        \forall S\subset\mathcal{N},\; V(S)
-        & = \frac{1}{2}\log\det\left(I_d
-            + \mu\sum_{i\in S} x_ix_i^*\right)\\
-        & \defeq \frac{1}{2}\log\det A(S)
-    \end{align*}
-\end{theorem}
-
-\begin{proof}
-
-Let us denote by $X_S$ the matrix whose rows are the vectors $(x_i^*)_{i\in
-S}$. Observe that $A_S$ can simply be written as:
-\begin{displaymath}
-    A_S  = I_d + \mu X_S^* X_S
-\end{displaymath}
-
-Let us recall that the entropy of a multivariate normal variable $B$ over
-$\mathbf{R}^d$ of covariance $\Sigma I_d$ is given by:
-\begin{equation}\label{eq:multivariate-entropy}
-    \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma I_d\big)
-\end{equation}
-
-Using the chain rule for conditional entropy, we get that:
-\begin{displaymath}
-    V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta)
-\end{displaymath}
-
-Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal
-distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence:
-\begin{equation}\label{eq:h1}
-    \mathbb{H}(Y_S\,|\,\beta) 
-    = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right)
-\end{equation}
-
-$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us
-compute its covariance matrix, $\Sigma_Y$:
-\begin{align*}
-    \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\
-             & = \kappa X_S X_S^* + \sigma^2I_n
-\end{align*}
-Thus, we get that:
-\begin{equation}\label{eq:h2}
-    \mathbb{H}(Y_S) 
-    = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S X_S^* + \sigma^2 I_n)\right)
-\end{equation}
-
-Combining \eqref{eq:h1} and \eqref{eq:h2} we get:
-\begin{displaymath}
-    V(S) = \frac{1}{2}\log\det\left(I_n+\frac{\kappa}{\sigma^2}X_S
-    X_S^*\right)
-\end{displaymath}
-
-Finally, we can use Sylvester's determinant theorem to get the result.
-\end{proof}
-
-It is also interesting to look at the marginal contribution of a user to a set: the
-increase of value induced by adding a user to an already existing set of users.
-We have the following lemma.
-
-\begin{lemma}[Marginal contribution]
-    \begin{displaymath}
-        \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S) 
-        = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right)
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-    We have:
-    \begin{align*}
-        V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\
-        & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\
-        & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i
-    x_i^*\right)\\
-        & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right)
-    \end{align*}
-    where the last equality comes from Sylvester's determinant formula.
-\end{proof}
-
-Because $A(S)$ is symmetric definite positive, the marginal contribution is
-positive, which proves that the value function is set increasing. Furthermore,
-it is easy to see that if $S\subset S'$, then $A(S)\leq A(S')$. Using the fact
-that matrix inversion is decreasing, we see that the marginal contribution of
-a fixed user is a set decreasing function. This is the \emph{submodularity} of
-the value function.
-
-TODO? Explain what are the points which are the most valuable : points which
-are aligned along directions where the spread over the already existing points
-is small.
-
-\subsection{Auction}
-
-TODO Explain the optimization problem, why it has to be formulated as an auction
-problem. Explain the goals:
-\begin{itemize}
-    \item truthful
-    \item individually rational
-    \item budget feasible
-    \item has a good approximation ratio
-
-TODO Explain what is already known: it is ok when the function is submodular. When
-should we introduce the notion of submodularity?
-\end{itemize}
-
+\input{problem}
 \section{Main result}
-
-All the budget feasible mechanisms studied recently (TODO ref Singer Chen)
-rely on using a greedy heuristic extending the idea of the greedy heuristic for
-the knapsack problem. In knapsack, objects are selected based on their
-\emph{value-per-cost} ratio. The quantity which plays a similar role for
-general submodular functions is the \emph{marginal-contribution-per-cost}
-ratio: let us assume that you have already selected a set of points $S$, then
-the \emph{marginal-contribution-per-cost} ratio per cost of a new point $i$ is
-defined by:
-\begin{displaymath}
-    \frac{V(S\cup\{i\}) - V(S)}{c_i}
-\end{displaymath}
-
-The greedy heuristic then simply repeatedly selects the point whose
-marginal-contribution-per-cost ratio is the highest until it reaches the budget
-limit. Mechanism considerations aside, this is known to have an unbounded
-approximation ratio. However, lemma TODO ref in Chen or Singer shows that the
-maximum between the greedy heuristic and the point with maximum value (as
-a singleton set) provides a $\frac{5e}{e-1}$ approximation ratio.
-
-Unfortunately, TODO Singer 2011 points out that taking the maximum between the
-greedy heuristic and the most valuable point is not incentive compatible.
-Singer and Chen tackle this issue similarly: instead of comparing the most
-valuable point to the greedy solution, they compare it to a solution which is
-close enough to keep a constant approximation ratio:
-\begin{itemize}
-    \item Chen suggests using $OPT(V,\mathcal{N}\setminus\{i\}, B)$.
-        Unfortunately, in the general case, this cannot be computed exactly in
-        polynomial time.
-    \item Singer uses using the optimal value of a relaxed objective function
-        which can be proven to be close to the optimal of the original
-        objective function. The function used is tailored to the specific
-        problem of coverage.
-\end{itemize}
-
-Here, we use a relaxation of the objective function which is tailored to the
-problem of ridge regression. We define:
-\begin{displaymath}
-    \forall\lambda\in[0,1]^{|\mathcal{N}|}\,\quad L_{\mathcal{N}}(\lambda) \defeq
-    \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}} \lambda_i x_i x_i^*\right)
-\end{displaymath}
-
-We can now present the mechanism we use, which has a same flavor to Chen's and
-Singer's
-
-\begin{algorithm}\label{mechanism}
-    \caption{Mechanism for ridge regression}
-    \begin{algorithmic}[1]
-    \State $i^* \gets \argmax_{j\in\mathcal{N}}V(j)$
-    \State $x^* \gets \argmax_{x\in[0,1]^n} \{L_{\mathcal{N}\setminus\{i^*\}}(x)
-                                    \,|\, c(x)\leq B\}$
-        \Statex
-        \If{$L(x^*) < CV(i^*)$}
-            \State \textbf{return} $\{i^*\}$
-        \Else
-            \State $i \gets \argmax_{1\leq j\leq n}\frac{V(j)}{c_j}$
-            \State $S \gets \emptyset$
-            \While{$c_i\leq \frac{B}{2}\frac{V(S\cup\{i\})-V(S)}{V(S\cup\{i\})}$}
-                \State $S \gets S\cup\{i\}$
-                \State $i \gets \argmax_{j\in\mathcal{N}\setminus S}
-                \frac{V(S\cup\{j\})-V(S)}{c_j}$
-            \EndWhile
-            \State \textbf{return} $S$
-        \EndIf
-    \end{algorithmic}
-\end{algorithm}
-
-Notice, that the stopping condition in the while loop is more sophisticated
-than just ensuring that the sum of the costs does not exceed the budget. This
-is because the selected users will be payed more than their costs, and this
-stopping condition ensures budget feasibility when the users are paid their
-threshold payment.
-
-We can now state the main result of this section:
-\begin{theorem}
-    The mechanism in \ref{mechanism} is truthful, individually rational, budget
-    feasible. Furthermore, choosing:
-    \begin{multline*}
-        C = C^* =  \frac{5e-1 + C_\mu(2e+1)}{2C_\mu(e-1)}\\
-        + \frac{\sqrt{C_\mu^2(1+2e)^2
-        + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
-    \end{multline*}
-    we get an approximation ratio of:
-    \begin{multline*}
-        1 + C^* = \frac{5e-1 + C_\mu(4e-1)}{2C_\mu(e-1)}\\
-        + \frac{\sqrt{C_\mu^2(1+2e)^2
-        + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
-    \end{multline*}
-    where:
-    \begin{displaymath}
-        C_\mu = \frac{\log(1+\mu)}{2\mu}
-    \end{displaymath}
-\end{theorem}
-
-The proof will consist of the claims of the theorem broken down into lemmas.
-
-Because the user strategy is parametrized by a single parameter, truthfulness
-is equivalent to monotonicity (TODO ref). The proof here is the same as in TODO
-and is given for the sake of completeness.
-
-\begin{lemma}
-The mechanism is monotone.
-\end{lemma}
-
-\begin{proof}
-    We assume by contradiction that there exists a user $i$ that has been
-    selected by the mechanism and that would not be selected had he reported
-    a cost $c_i'\leq c_i$ (all the other costs staying the same).
-
-    If $i\neq i^*$ and $i$ has been selected, then we are in the case where
-    $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
-    part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the
-    submodularity of $V$, we see that $i$ will satisfy the greedy selection
-    rule:
-    \begin{displaymath}
-        i = \argmax_{j\in\mathcal{N}\setminus S} \frac{V(S\cup\{j\})
-        - V(S)}{c_j}
-    \end{displaymath}
-    in an earlier iteration of the greedy heuristic. Let us denote by $S_i$
-    (resp. $S_i'$) the set to which $i$ is added when reporting cost $c_i$
-    (resp. $c_i'$). We have $S_i'\subset S_i$. Moreover:
-    \begin{align*}
-        c_i' & \leq c_i \leq
-        \frac{B}{2}\frac{V(S_i\cup\{i\})-V(S_i)}{V(S_i\cup\{i\})}\\
-        & \leq \frac{B}{2}\frac{V(S_i'\cup\{i\})-V(S_i')}{V(S_i'\cup\{i\})}
-    \end{align*}
-    Hence $i$ will still be included in the result set.
-
-    If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$
-    instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is
-    computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by
-    reporting a different cost.
-\end{proof}
-
-\begin{lemma}
-The mechanism is budget feasible.
-\end{lemma}
-
-The proof is the same as in Chen and is given here for the sake of
-completeness.
-\begin{proof}
-
-\end{proof}
-
-Using the characterization of the threshold payments from Singer TODO, we can
-prove individual rationality similarly to Chen TODO.
-\begin{lemma}
-The mechanism is individually rational
-\end{lemma}
-
-\begin{proof}
-
-\end{proof}
-
-The following lemma requires to have a careful look at the relaxation function
-we chose in the mechanism. The next section will be dedicated to studying this
-relaxation and will contain the proof of the following lemma:
-\begin{lemma}
-    We have:
-    \begin{displaymath}
-        OPT(L_\mathcal{N}, B) \leq \frac{1}{C_\mu}\big(2 OPT(V,\mathcal{N},B)
-        + \max_{i\in\mathcal{N}}V(i)\big)
-    \end{displaymath}
-\end{lemma}
-
-\begin{lemma}
-    Let us denote by $S_M$ the set returned by the mechanism. Let us also
-    write:
-    \begin{displaymath}
-        C_{\textrm{max}} = \max\left(1+C,\frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
-        C_\mu(e-1) -5e  +1}\right)\right)
-    \end{displaymath}
-
-    Then:
-    \begin{displaymath}
-        OPT(V, \mathcal{N}, B) \leq
-        C_\text{max}\cdot V(S_M) 
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
-    If the condition on line 3 of the algorithm holds, then:
-    \begin{displaymath}
-        V(i^*) \geq \frac{1}{C}L(x^*) \geq
-        \frac{1}{C}OPT(V,\mathcal{N}\setminus\{i\}, B)
-    \end{displaymath}
-
-    But:
-    \begin{displaymath}
-        OPT(V,\mathcal{N},B) \leq OPT(V,\mathcal{N}\setminus\{i\}, B) + V(i^*)
-    \end{displaymath}
-
-    Hence:
-    \begin{displaymath}
-        V(i^*) \geq \frac{1}{C+1} OPT(V,\mathcal{N}, B)
-    \end{displaymath}
-
-    If the condition of the algorithm does not hold:
-    \begin{align*}
-        V(i^*) & \leq \frac{1}{C}L(x^*) \leq \frac{1}{C\cdot C_\mu}
-        \big(2 OPT(V,\mathcal{N}, B) + V(i^*)\big)\\
-        & \leq \frac{1}{C\cdot C_\mu}\left(\frac{2e}{e-1}\big(3 V(S_M)
-        + 2 V(i^*)\big)
-        + V(i^*)\right)
-    \end{align*}
-    
-    Thus:
-    \begin{align*}
-        V(i^*) \leq \frac{6e}{C\cdot C_\mu(e-1)- 5e + 1} V(S_M)
-    \end{align*}
-
-    Finally, using again that:
-    \begin{displaymath}
-        OPT(V,\mathcal{N},B) \leq \frac{e}{e-1}\big(3 V(S_M) + 2 V(i^*)\big)
-    \end{displaymath}
-    
-    We get:
-    \begin{displaymath}
-        OPT(V, \mathcal{N}, B) \leq \frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
-        C_\mu(e-1) -5e  +1}\right) V(S_M)
-    \end{displaymath}
-\end{proof}
-
-The optimal value for $C$ is:
-\begin{displaymath}
-    C^* = \arg\min C_{\textrm{max}}
-\end{displaymath}
-
-This equation has two solutions. Only one of those is such that:
-\begin{displaymath}
-    C\cdot C_\mu(e-1) -5e  +1 \geq 0
-\end{displaymath}
-which is needed in the proof of the previous lemma. Computing this solution,
-we can state the main result of this section.
-
-\subsection{Relaxations of the value function}
-
-To prove lemma TODO, we will use a general method called pipage rounding,
-introduced in TODO. Two consecutive relaxations are used: the one that we are
-interested in, whose optimization can be computed efficiently, and the
-multilinear extension which presents a \emph{cross-convexity} like behavior
-which allows for rounding of fractional solution without decreasing the value
-of the objective function and thus ensures a constant approximation of the
-value function. The difficulty resides in showing that the ratio of the two
-relaxations is bounded.
-
-We say that $R_\mathcal{N}:[0,1]^n\rightarrow\mathbf{R}$ is a relaxation of the
-value function $V$ over $\mathcal{N}$ if it coincides with $V$ at binary
-points. Formally, for any $S\subset\mathcal{N}$, let $\mathbf{1}_S$ denote the
-indicator vector of $S$. $R_\mathcal{N}$ is a relaxation of $V$ over
-$\mathcal{N}$ iff:
-\begin{displaymath}
-    \forall S\subset\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S)
-\end{displaymath}
-
-We can extend the optimisation problem defined above to a relaxation by
-extending the cost function:
-\begin{displaymath}
-    \forall \lambda\in[0,1]^n,\; c(\lambda)
-    = \sum_{i\in\mathcal{N}}\lambda_ic_i
-\end{displaymath}
-The optimisation problem becomes:
-\begin{displaymath}
-    OPT(R_\mathcal{N}, B) =
-    \max_{\lambda\in[0,1]^n}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\}
-\end{displaymath}
-The relaxations we will consider here rely on defining a probability
-distribution over subsets of $\mathcal{N}$.
-
-Let $\lambda\in[0,1]^n$, let us define:
-\begin{displaymath}
-    P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i
-    \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i)
-\end{displaymath}
-$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select
-a subset of $\mathcal{N}$ at random by deciding independently for each point to
-include it in the set with probability $\lambda_i$ (and to exclude it with
-probability $1-\lambda_i$).
-
-We will consider two relaxations of the value function $V$ over $\mathcal{N}$:
-\begin{itemize}
-    \item the \emph{multi-linear extension} of $V$:
-        \begin{align*}
-            F_\mathcal{N}(\lambda) 
-            & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\
-            & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\
-            & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\
-        \end{align*}
-    \item the \emph{concave relaxation} of $V$:
-        \begin{align*}
-            L_{\mathcal{N}}(\lambda) 
-            & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\
-            & = \log\det\left(\sum_{S\subset N}
-                P_\mathcal{N}^\lambda(S)A(S)\right)\\
-            & = \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}}
-                \lambda_ix_ix_i^*\right)\\
-            & \defeq \log\det \tilde{A}(\lambda)
-        \end{align*}
-\end{itemize}
-
-\begin{lemma}
-    The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence
-    this relaxation is well-named!}.
-\end{lemma}
-
-\begin{proof}
-    This follows from the concavity of the $\log\det$ function over symmetric
-    positive semi-definite matrices. More precisely, if $A$ and $B$ are two
-    symmetric positive semi-definite matrices, then:
-    \begin{multline*}
-        \forall\alpha\in [0, 1],\; \log\det\big(\alpha A + (1-\alpha) B\big)\\
-        \geq \alpha\log\det A + (1-\alpha)\log\det B
-    \end{multline*}
-\end{proof}
-
-\begin{lemma}[Rounding]\label{lemma:rounding}
-    For any feasible $\lambda\in[0,1]^n$, there exists a feasible
-    $\bar{\lambda}\in[0,1]^n$ such that at most one of its component is
-    fractional, that is, lies in $(0,1)$ and:
-    \begin{displaymath}
-        F_{\mathcal{N}}(\lambda)\leq F_{\mathcal{N}}(\bar{\lambda})
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-    We give a rounding procedure which given a feasible $\lambda$ with at least
-    two fractional components, returns some $\lambda'$ with one less fractional
-    component, feasible such that:
-    \begin{displaymath}
-        F_\mathcal{N}(\lambda) \leq F_\mathcal{N}(\lambda')
-    \end{displaymath}
-    Applying this procedure recursively yields the lemma's result.
-
-    Let us consider such a feasible $\lambda$. Let $i$ and $j$ be two
-    fractional components of $\lambda$ and let us define the following
-    function:
-    \begin{displaymath}
-        F_\lambda(\varepsilon) = F(\lambda_\varepsilon)
-        \quad\textrm{where} \quad
-        \lambda_\varepsilon = \lambda + \varepsilon\left(e_i-\frac{c_i}{c_j}e_j\right)
-    \end{displaymath}
-
-    It is easy to see that if $\lambda$ is feasible, then:
-    \begin{multline}\label{eq:convex-interval}
-        \forall\varepsilon\in\Big[\max\Big(-\lambda_i,(\lambda_j-1)\frac{c_j}{c_i}\Big), \min\Big(1-\lambda_i, \lambda_j
-        \frac{c_j}{c_i}\Big)\Big],\;\\
-            \lambda_\varepsilon\;\;\textrm{is feasible}
-    \end{multline}
-
-    Furthermore, the function $F_\lambda$ is convex, indeed:
-    \begin{align*}
-        F_\lambda(\varepsilon)
-        & = \mathbb{E}_{S'\sim P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
-        (\lambda_i+\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i,j\})\\
-        & + (\lambda_i+\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i\})\\
-        & + (1-\lambda_i-\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{j\})\\
-        & + (1-\lambda_i-\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S')\Big]\\
-    \end{align*}
-    Thus, $F_\lambda$ is a degree 2 polynomial whose dominant coefficient is:
-    \begin{multline*}
-        \frac{c_i}{c_j}\mathbb{E}_{S'\sim
-        P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
-            V(S'\cup\{i\})+V(S'\cup\{i\})\\
-        -V(S'\cup\{i,j\})-V(S')\Big]
-    \end{multline*}
-    which is positive by submodularity of $V$. Hence, the maximum of
-    $F_\lambda$ over the interval given in \eqref{eq:convex-interval} is
-    attained at one of its limit, at which either the $i$-th or $j$-th component of
-    $\lambda_\varepsilon$ becomes integral.
-\end{proof}
-
-\begin{lemma}\label{lemma:relaxation-ratio}
-    The following inequality holds:
-    \begin{displaymath}
-        \forall\lambda\in[0,1]^n,\; 
-        \frac{\log\big(1+\mu\big)}{2\mu}
-        \,L_\mathcal{N}(\lambda)\leq
-        F_\mathcal{N}(\lambda)\leq L_{\mathcal{N}}(\lambda)
-    \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
-    We will prove that:
-    \begin{displaymath}
-        \frac{\log\big(1+\mu\big)}{2\mu}
-    \end{displaymath}
-    is a lower bound of the ratio $\partial_i F_\mathcal{N}(\lambda)/\partial_i
-    L_\mathcal{N}(\lambda)$.
-
-    This will be enough to conclude, by observing that:
-    \begin{displaymath}
-        \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
-        \sim_{\lambda\rightarrow 0}
-        \frac{\sum_{i\in \mathcal{N}}\lambda_i\partial_i F_\mathcal{N}(0)}
-        {\sum_{i\in\mathcal{N}}\lambda_i\partial_i L_\mathcal{N}(0)}
-    \end{displaymath}
-    and that an interior critical point of the ratio
-    $F_\mathcal{N}(\lambda)/L_\mathcal{N}(\lambda)$ is defined by:
-    \begin{displaymath}
-        \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
-        = \frac{\partial_i F_\mathcal{N}(\lambda)}{\partial_i
-        L_\mathcal{N}(\lambda)}
-    \end{displaymath}
-
-    Let us start by computing the derivatives of $F_\mathcal{N}$ and
-    $L_\mathcal{N}$ with respect to
-    the $i$-th component.
-
-    For $F$, it suffices to look at the derivative of
-    $P_\mathcal{N}^\lambda(S)$:
-    \begin{displaymath}
-        \partial_i P_\mathcal{N}^\lambda(S) = \left\{
-            \begin{aligned}
-                & P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})\;\textrm{if}\; i\in S \\
-                & - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\;\textrm{if}\;
-                i\in \mathcal{N}\setminus S \\
-            \end{aligned}\right.
-    \end{displaymath}
-
-    Hence:
-    \begin{multline*}
-        \partial_i F_\mathcal{N} =
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)\\
-        - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
-    \end{multline*}
-
-    Now, using that every $S$ such that $i\in S$ can be uniquely written as
-    $S'\cup\{i\}$, we can write:
-    \begin{multline*}
-        \partial_i F_\mathcal{N} =
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S\cup\{i\})\\
-        - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
-    \end{multline*}
-
-    Finally, by using the expression for the marginal contribution of $i$ to
-    $S$:
-    \begin{displaymath}
-        \partial_i F_\mathcal{N}(\lambda) = 
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)
-    \end{displaymath}
-
-    The computation of the derivative of $L_\mathcal{N}$ uses standard matrix
-    calculus and gives:
-    \begin{displaymath}
-        \partial_i L_\mathcal{N}(\lambda)
-        = \mu x_i^* \tilde{A}(\lambda)^{-1}x_i
-    \end{displaymath}
-    
-    Using the following inequalities:
-    \begin{gather*}
-        \forall S\subset\mathcal{N}\setminus\{i\},\quad
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\geq
-        P_{\mathcal{N}\setminus\{i\}}^\lambda(S\cup\{i\})\\
-        \forall S\subset\mathcal{N},\quad P_{\mathcal{N}\setminus\{i\}}^\lambda(S) 
-        \geq P_\mathcal{N}^\lambda(S)\\
-        \forall S\subset\mathcal{N},\quad A(S)^{-1} \geq A(S\cup\{i\})^{-1}\\
-    \end{gather*}
-    we get:
-    \begin{align*}
-        \partial_i F_\mathcal{N}(\lambda) 
-        & \geq \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_\mathcal{N}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
-        & \geq \frac{1}{2}
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_\mathcal{N}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
-        &\hspace{-3.5em}+\frac{1}{2}
-        \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
-        P_\mathcal{N}^\lambda(S\cup\{i\})
-        \log\Big(1 + \mu x_i^*A(S\cup\{i\})^{-1}x_i\Big)\\
-        &\geq \frac{1}{2}
-        \sum_{S\subset\mathcal{N}}
-        P_\mathcal{N}^\lambda(S)
-        \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
-    \end{align*}
-
-    Using that $A(S)\geq I_d$ we get that:
-    \begin{displaymath}
-        \mu x_i^*A(S)^{-1}x_i \leq \mu
-    \end{displaymath}
-    
-    Moreover:
-    \begin{displaymath}
-        \forall x\leq\mu,\; \log(1+x)\geq
-    \frac{\log\big(1+\mu\big)}{\mu} x
-    \end{displaymath}
-
-    Hence:
-    \begin{displaymath}
-        \partial_i F_\mathcal{N}(\lambda) \geq
-        \frac{\log\big(1+\mu\big)}{2\mu}
-        x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)^{-1}\bigg)x_i
-    \end{displaymath}
-    
-    Finally, using that the inverse is a matrix convex function over symmetric
-    positive definite matrices:
-    \begin{align*}
-        \partial_i F_\mathcal{N}(\lambda) &\geq
-        \frac{\log\big(1+\mu\big)}{2\mu}
-        x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\
-        & \geq \frac{\log\big(1+\mu\big)}{2\mu}
-        \partial_i L_\mathcal{N}(\lambda)
-    \end{align*}
-\end{proof}
-
-We can now prove lemma TODO from previous section.
-
-\begin{proof}
-    Let us consider a feasible point $\lambda^*\in[0,1]^n$ such that $L_\mathcal{N}(\lambda^*)
-    = OPT(L_\mathcal{N}, B)$. By applying lemma~\ref{lemma:relaxation-ratio}
-    and lemma~\ref{lemma:rounding} we get a feasible point $\bar{\lambda}$ with at most
-    one fractional component such that:
-    \begin{equation}\label{eq:e1}
-        L_\mathcal{N}(\lambda^*) \leq \frac{1}{C_\mu}
-        F_\mathcal{N}(\bar{\lambda})
-    \end{equation}
-
-    Let $\lambda_i$ denote the fractional component of $\bar{\lambda}$ and $S$
-    denote the set whose indicator vector is $\bar{\lambda} - \lambda_i e_i$.
-    Using the fact that $F_\mathcal{N}$ is linear with respect to the $i$-th
-    component and is a relaxation of the value function, we get:
-    \begin{displaymath}
-        F_\mathcal{N}(\bar{\lambda}) = V(S) +\lambda_i V(S\cup\{i\})
-    \end{displaymath}
-
-    Using the submodularity of $V$:
-    \begin{displaymath}
-        F_\mathcal{N}(\bar{\lambda}) \leq 2 V(S) + V(i)
-    \end{displaymath}
-
-    Note that since $\bar{\lambda}$ is feasible, $S$ is also feasible and
-    $V(S)\leq OPT(V,\mathcal{N}, B)$. Hence:
-    \begin{equation}\label{eq:e2}
-        F_\mathcal{N}(\bar{\lambda}) \leq 2 OPT(V,\mathcal{N}, B)
-        + \max_{i\in\mathcal{N}} V(i)
-    \end{equation}
-
-    Putting \eqref{eq:e1} and \eqref{eq:e2} together gives the results.
-\end{proof}
-
+\input{main}
 \section{General setup}
-
+\input{general}
 \section{Conclusion}
-
+\input{conclusion}
 \end{document}
diff --git a/problem.tex b/problem.tex
new file mode 100644
index 0000000..fb9f8e1
--- /dev/null
+++ b/problem.tex
@@ -0,0 +1,193 @@
+\subsection{Notations}
+
+Throughout the paper, we will make use of the following notations: if $x$ is
+a (column) vector in $\mathbf{R}^d$, $x^*$ denotes its transposed (line)
+vector. Thus, the standard inner product between two vectors $x$ and $y$ is
+simply $x^* y$. $\norm{x}_2 = x^*x$ will denote the $L_2$ norm of $x$.
+
+We will also often use the following order over symmetric matrices: if $A$ and
+$B$ are two $d\times d$ and $B$ are two $d\times d$ real symmetric matrices, we
+write that $A\leq B$ iff:
+\begin{displaymath}
+    \forall x\in\mathbf{R}^d,\quad
+    x^*Ax \leq x^*Bx
+\end{displaymath}
+That is, iff $B-A$ is symmetric semi-definite positive.
+
+This order let us define the notion of a \emph{decreasing} or \emph{convex}
+matrix function similarly to their real counterparts. In particular, let us
+recall that the matrix inversion is decreasing and convex over symmetric
+definite positive matrices.
+
+\subsection{Data model}
+
+There is a set of $n$ users, $\mathcal{N} = \{1,\ldots, n\}$. Each user
+$i\in\mathcal{N}$ has a public vector of features $x_i\in\mathbf{R}^d$ and an
+undisclosed piece of information $y_i\in\mathbf{R}$. We assume that the data
+has already been normalized so that $\norm{x_i}_2\leq 1$ for all
+$i\in\mathcal{N}$.
+
+The experimenter is going to select a set of users and ask them to reveal their
+private piece of information. We are interested in a \emph{survey setup}: the
+experimenter has not seen the data yet, but he wants to know which users he
+should be selecting. His goal is to learn the model underlying the data. Here,
+we assume a linear model:
+\begin{displaymath}
+    \forall i\in\mathcal{N},\quad y_i = \beta^* x_i + \varepsilon_i
+\end{displaymath}
+where $\beta\in\mathbf{R}^d$ and $\varepsilon_i\in\mathbf{R}$ follows a normal
+distribution of mean $0$ and variance $\sigma^2$. Furthermore, we assume the
+error $\varepsilon$ to be independent of the user:
+$(\varepsilon_i)_{i\in\mathcal{N}}$ are mutually independent.
+
+After observing the data, the experimenter could simply do linear regression to
+learn the model parameter $\beta$. However, in a more general setup, the
+experimenter has a prior knowledge about $\beta$, a distribution over
+$\mathbf{R}^d$. After observing the data, the experimenter performs
+\emph{maximum a posteriori estimation}: computing the point which maximizes the
+posterior distribution of $\beta$ given the observations.
+
+Here, we will assume, as it is often done, that the prior distribution is
+a multivariate normal distribution of mean zero and covariance matrix $\kappa
+I_d$. Maximum a posteriori estimation leads to the following maximization
+problem:
+\begin{displaymath}
+    \beta_{\text{max}} = \argmax_{\beta\in\mathbf{R}^d} \sum_i (y_i - \beta^*x_i)^2
+    + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
+\end{displaymath}
+which is the well-known \emph{ridge regression}. $\mu
+= \frac{\kappa}{\sigma^2}$ is the regularization parameter. Ridge regression
+can thus be seen as linear regression with a regularization term which
+prevents $\beta$ from having a large $L_2$-norm.
+
+\subsection{Value of data}
+
+Because the user private variables $y_i$ have not been observed yet when the
+experimenter has to decide which users to include in his experiment, we treat
+$\beta$ as a random variable whose distribution is updated after observing the
+data.
+
+Let us recall that if $\beta$ is random variable over $\mathbf{R}^d$ whose
+probability distribution has a density function $f$ with respect to the
+Lebesgue measure, its entropy is given by:
+\begin{displaymath}
+    \mathbb{H}(\beta) \defeq - \int_{b\in\mathbf{R}^d} \log f(b) f(b)\text{d}b
+\end{displaymath}
+
+A usual way to measure the decrease of uncertainty induced by the observation
+of data is to use the entropy. This leads to the following definition of the
+value of data called the \emph{value of information}:
+\begin{displaymath}
+    \forall S\subset\mathcal{N},\quad V(S) = \mathbb{H}(\beta)
+    - \mathbb{H}(\beta\,|\,
+    Y_S)
+\end{displaymath}
+where $Y_S = \{y_i,\,i\in S\}$ is the set of observed data.
+
+\begin{theorem}
+    Under the ridge regression model explained in section TODO, the value of data
+    is equal to:
+    \begin{align*}
+        \forall S\subset\mathcal{N},\; V(S)
+        & = \frac{1}{2}\log\det\left(I_d
+            + \mu\sum_{i\in S} x_ix_i^*\right)\\
+        & \defeq \frac{1}{2}\log\det A(S)
+    \end{align*}
+\end{theorem}
+
+\begin{proof}
+
+Let us denote by $X_S$ the matrix whose rows are the vectors $(x_i^*)_{i\in
+S}$. Observe that $A_S$ can simply be written as:
+\begin{displaymath}
+    A_S  = I_d + \mu X_S^* X_S
+\end{displaymath}
+
+Let us recall that the entropy of a multivariate normal variable $B$ over
+$\mathbf{R}^d$ of covariance $\Sigma I_d$ is given by:
+\begin{equation}\label{eq:multivariate-entropy}
+    \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma I_d\big)
+\end{equation}
+
+Using the chain rule for conditional entropy, we get that:
+\begin{displaymath}
+    V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta)
+\end{displaymath}
+
+Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal
+distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence:
+\begin{equation}\label{eq:h1}
+    \mathbb{H}(Y_S\,|\,\beta) 
+    = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right)
+\end{equation}
+
+$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us
+compute its covariance matrix, $\Sigma_Y$:
+\begin{align*}
+    \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\
+             & = \kappa X_S X_S^* + \sigma^2I_n
+\end{align*}
+Thus, we get that:
+\begin{equation}\label{eq:h2}
+    \mathbb{H}(Y_S) 
+    = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S X_S^* + \sigma^2 I_n)\right)
+\end{equation}
+
+Combining \eqref{eq:h1} and \eqref{eq:h2} we get:
+\begin{displaymath}
+    V(S) = \frac{1}{2}\log\det\left(I_n+\frac{\kappa}{\sigma^2}X_S
+    X_S^*\right)
+\end{displaymath}
+
+Finally, we can use Sylvester's determinant theorem to get the result.
+\end{proof}
+
+It is also interesting to look at the marginal contribution of a user to a set: the
+increase of value induced by adding a user to an already existing set of users.
+We have the following lemma.
+
+\begin{lemma}[Marginal contribution]
+    \begin{displaymath}
+        \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S) 
+        = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right)
+    \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+    We have:
+    \begin{align*}
+        V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\
+        & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\
+        & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i
+    x_i^*\right)\\
+        & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right)
+    \end{align*}
+    where the last equality comes from Sylvester's determinant formula.
+\end{proof}
+
+Because $A(S)$ is symmetric definite positive, the marginal contribution is
+positive, which proves that the value function is set increasing. Furthermore,
+it is easy to see that if $S\subset S'$, then $A(S)\leq A(S')$. Using the fact
+that matrix inversion is decreasing, we see that the marginal contribution of
+a fixed user is a set decreasing function. This is the \emph{submodularity} of
+the value function.
+
+TODO? Explain what are the points which are the most valuable : points which
+are aligned along directions where the spread over the already existing points
+is small.
+
+\subsection{Auction}
+
+TODO Explain the optimization problem, why it has to be formulated as an auction
+problem. Explain the goals:
+\begin{itemize}
+    \item truthful
+    \item individually rational
+    \item budget feasible
+    \item has a good approximation ratio
+
+TODO Explain what is already known: it is ok when the function is submodular. When
+should we introduce the notion of submodularity?
+\end{itemize}
+
+
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-10-28 15:16:26 -0700
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-10-28 15:16:26 -0700
commit	3fdd3c6674acfecffa3ec734f0183f1df6a239c1 (patch)
tree	b595e3a821e6e2ae679dfd725f106c89964cc752
parent	cc28557b334d2bc53e9c8243409550ea1c9f368f (diff)
download	recommendation-3fdd3c6674acfecffa3ec734f0183f1df6a239c1.tar.gz