summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorStratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>2012-10-28 15:16:26 -0700
committerStratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>2012-10-28 15:16:26 -0700
commit3fdd3c6674acfecffa3ec734f0183f1df6a239c1 (patch)
treeb595e3a821e6e2ae679dfd725f106c89964cc752
parentcc28557b334d2bc53e9c8243409550ea1c9f368f (diff)
downloadrecommendation-3fdd3c6674acfecffa3ec734f0183f1df6a239c1.tar.gz
file splitting
-rw-r--r--conclusion.tex0
-rw-r--r--definitions.tex16
-rw-r--r--general.tex0
-rw-r--r--intro.tex17
-rw-r--r--main.tex550
-rw-r--r--paper.tex780
-rw-r--r--problem.tex193
7 files changed, 782 insertions, 774 deletions
diff --git a/conclusion.tex b/conclusion.tex
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/conclusion.tex
diff --git a/definitions.tex b/definitions.tex
new file mode 100644
index 0000000..ca6b3ba
--- /dev/null
+++ b/definitions.tex
@@ -0,0 +1,16 @@
+\newtheorem{lemma}{Lemma}
+\newtheorem{fact}{Fact}
+\newtheorem{example}{Example}
+\newtheorem{prop}{Proposition}
+\newtheorem{theorem}{Theorem}
+\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
+\newcommand{\var}{\mathop{\mathrm{Var}}}
+\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
+\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
+\newcommand{\norm}[1]{\lVert#1\rVert}
+\newcommand{\tr}[1]{#1^*}
+\newcommand{\ip}[2]{\langle #1, #2 \rangle}
+\newcommand{\mse}{\mathop{\mathrm{MSE}}}
+\DeclareMathOperator{\trace}{tr}
+\DeclareMathOperator*{\argmax}{arg\,max}
+
diff --git a/general.tex b/general.tex
new file mode 100644
index 0000000..e69de29
--- /dev/null
+++ b/general.tex
diff --git a/intro.tex b/intro.tex
new file mode 100644
index 0000000..649e7de
--- /dev/null
+++ b/intro.tex
@@ -0,0 +1,17 @@
+
+\begin{itemize}
+ \item already existing field of experiment design: survey-like setup, what
+ are the best points to include in your experiment? Measure of the
+ usefulness of the data: variance-reduction or entropy-reduction.
+ \item nowadays, there is also a big focus on purchasing data: paid surveys,
+ mechanical turk, etc. that add economic aspects to the problem of
+ experiment design
+ \item recent advances (Singer, Chen) in the field of budgeted mechanisms
+ \item we study ridge regression, very widely used in statistical learning,
+ and treat it as a problem of budgeted experiment design
+ \item we make the following contributions: ...
+ \item extension to a more general setup which includes a wider class of
+ machine learning problems
+\end{itemize}
+
+
diff --git a/main.tex b/main.tex
new file mode 100644
index 0000000..189156c
--- /dev/null
+++ b/main.tex
@@ -0,0 +1,550 @@
+
+All the budget feasible mechanisms studied recently (TODO ref Singer Chen)
+rely on using a greedy heuristic extending the idea of the greedy heuristic for
+the knapsack problem. In knapsack, objects are selected based on their
+\emph{value-per-cost} ratio. The quantity which plays a similar role for
+general submodular functions is the \emph{marginal-contribution-per-cost}
+ratio: let us assume that you have already selected a set of points $S$, then
+the \emph{marginal-contribution-per-cost} ratio per cost of a new point $i$ is
+defined by:
+\begin{displaymath}
+ \frac{V(S\cup\{i\}) - V(S)}{c_i}
+\end{displaymath}
+
+The greedy heuristic then simply repeatedly selects the point whose
+marginal-contribution-per-cost ratio is the highest until it reaches the budget
+limit. Mechanism considerations aside, this is known to have an unbounded
+approximation ratio. However, lemma TODO ref in Chen or Singer shows that the
+maximum between the greedy heuristic and the point with maximum value (as
+a singleton set) provides a $\frac{5e}{e-1}$ approximation ratio.
+
+Unfortunately, TODO Singer 2011 points out that taking the maximum between the
+greedy heuristic and the most valuable point is not incentive compatible.
+Singer and Chen tackle this issue similarly: instead of comparing the most
+valuable point to the greedy solution, they compare it to a solution which is
+close enough to keep a constant approximation ratio:
+\begin{itemize}
+ \item Chen suggests using $OPT(V,\mathcal{N}\setminus\{i\}, B)$.
+ Unfortunately, in the general case, this cannot be computed exactly in
+ polynomial time.
+ \item Singer uses using the optimal value of a relaxed objective function
+ which can be proven to be close to the optimal of the original
+ objective function. The function used is tailored to the specific
+ problem of coverage.
+\end{itemize}
+
+Here, we use a relaxation of the objective function which is tailored to the
+problem of ridge regression. We define:
+\begin{displaymath}
+ \forall\lambda\in[0,1]^{|\mathcal{N}|}\,\quad L_{\mathcal{N}}(\lambda) \defeq
+ \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}} \lambda_i x_i x_i^*\right)
+\end{displaymath}
+
+We can now present the mechanism we use, which has a same flavor to Chen's and
+Singer's
+
+\begin{algorithm}\label{mechanism}
+ \caption{Mechanism for ridge regression}
+ \begin{algorithmic}[1]
+ \State $i^* \gets \argmax_{j\in\mathcal{N}}V(j)$
+ \State $x^* \gets \argmax_{x\in[0,1]^n} \{L_{\mathcal{N}\setminus\{i^*\}}(x)
+ \,|\, c(x)\leq B\}$
+ \Statex
+ \If{$L(x^*) < CV(i^*)$}
+ \State \textbf{return} $\{i^*\}$
+ \Else
+ \State $i \gets \argmax_{1\leq j\leq n}\frac{V(j)}{c_j}$
+ \State $S \gets \emptyset$
+ \While{$c_i\leq \frac{B}{2}\frac{V(S\cup\{i\})-V(S)}{V(S\cup\{i\})}$}
+ \State $S \gets S\cup\{i\}$
+ \State $i \gets \argmax_{j\in\mathcal{N}\setminus S}
+ \frac{V(S\cup\{j\})-V(S)}{c_j}$
+ \EndWhile
+ \State \textbf{return} $S$
+ \EndIf
+ \end{algorithmic}
+\end{algorithm}
+
+Notice, that the stopping condition in the while loop is more sophisticated
+than just ensuring that the sum of the costs does not exceed the budget. This
+is because the selected users will be payed more than their costs, and this
+stopping condition ensures budget feasibility when the users are paid their
+threshold payment.
+
+We can now state the main result of this section:
+\begin{theorem}
+ The mechanism in \ref{mechanism} is truthful, individually rational, budget
+ feasible. Furthermore, choosing:
+ \begin{multline*}
+ C = C^* = \frac{5e-1 + C_\mu(2e+1)}{2C_\mu(e-1)}\\
+ + \frac{\sqrt{C_\mu^2(1+2e)^2
+ + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
+ \end{multline*}
+ we get an approximation ratio of:
+ \begin{multline*}
+ 1 + C^* = \frac{5e-1 + C_\mu(4e-1)}{2C_\mu(e-1)}\\
+ + \frac{\sqrt{C_\mu^2(1+2e)^2
+ + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
+ \end{multline*}
+ where:
+ \begin{displaymath}
+ C_\mu = \frac{\log(1+\mu)}{2\mu}
+ \end{displaymath}
+\end{theorem}
+
+The proof will consist of the claims of the theorem broken down into lemmas.
+
+Because the user strategy is parametrized by a single parameter, truthfulness
+is equivalent to monotonicity (TODO ref). The proof here is the same as in TODO
+and is given for the sake of completeness.
+
+\begin{lemma}
+The mechanism is monotone.
+\end{lemma}
+
+\begin{proof}
+ We assume by contradiction that there exists a user $i$ that has been
+ selected by the mechanism and that would not be selected had he reported
+ a cost $c_i'\leq c_i$ (all the other costs staying the same).
+
+ If $i\neq i^*$ and $i$ has been selected, then we are in the case where
+ $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
+ part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the
+ submodularity of $V$, we see that $i$ will satisfy the greedy selection
+ rule:
+ \begin{displaymath}
+ i = \argmax_{j\in\mathcal{N}\setminus S} \frac{V(S\cup\{j\})
+ - V(S)}{c_j}
+ \end{displaymath}
+ in an earlier iteration of the greedy heuristic. Let us denote by $S_i$
+ (resp. $S_i'$) the set to which $i$ is added when reporting cost $c_i$
+ (resp. $c_i'$). We have $S_i'\subset S_i$. Moreover:
+ \begin{align*}
+ c_i' & \leq c_i \leq
+ \frac{B}{2}\frac{V(S_i\cup\{i\})-V(S_i)}{V(S_i\cup\{i\})}\\
+ & \leq \frac{B}{2}\frac{V(S_i'\cup\{i\})-V(S_i')}{V(S_i'\cup\{i\})}
+ \end{align*}
+ Hence $i$ will still be included in the result set.
+
+ If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$
+ instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is
+ computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by
+ reporting a different cost.
+\end{proof}
+
+\begin{lemma}
+The mechanism is budget feasible.
+\end{lemma}
+
+The proof is the same as in Chen and is given here for the sake of
+completeness.
+\begin{proof}
+
+\end{proof}
+
+Using the characterization of the threshold payments from Singer TODO, we can
+prove individual rationality similarly to Chen TODO.
+\begin{lemma}
+The mechanism is individually rational
+\end{lemma}
+
+\begin{proof}
+
+\end{proof}
+
+The following lemma requires to have a careful look at the relaxation function
+we chose in the mechanism. The next section will be dedicated to studying this
+relaxation and will contain the proof of the following lemma:
+\begin{lemma}
+ We have:
+ \begin{displaymath}
+ OPT(L_\mathcal{N}, B) \leq \frac{1}{C_\mu}\big(2 OPT(V,\mathcal{N},B)
+ + \max_{i\in\mathcal{N}}V(i)\big)
+ \end{displaymath}
+\end{lemma}
+
+\begin{lemma}
+ Let us denote by $S_M$ the set returned by the mechanism. Let us also
+ write:
+ \begin{displaymath}
+ C_{\textrm{max}} = \max\left(1+C,\frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
+ C_\mu(e-1) -5e +1}\right)\right)
+ \end{displaymath}
+
+ Then:
+ \begin{displaymath}
+ OPT(V, \mathcal{N}, B) \leq
+ C_\text{max}\cdot V(S_M)
+ \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+
+ If the condition on line 3 of the algorithm holds, then:
+ \begin{displaymath}
+ V(i^*) \geq \frac{1}{C}L(x^*) \geq
+ \frac{1}{C}OPT(V,\mathcal{N}\setminus\{i\}, B)
+ \end{displaymath}
+
+ But:
+ \begin{displaymath}
+ OPT(V,\mathcal{N},B) \leq OPT(V,\mathcal{N}\setminus\{i\}, B) + V(i^*)
+ \end{displaymath}
+
+ Hence:
+ \begin{displaymath}
+ V(i^*) \geq \frac{1}{C+1} OPT(V,\mathcal{N}, B)
+ \end{displaymath}
+
+ If the condition of the algorithm does not hold:
+ \begin{align*}
+ V(i^*) & \leq \frac{1}{C}L(x^*) \leq \frac{1}{C\cdot C_\mu}
+ \big(2 OPT(V,\mathcal{N}, B) + V(i^*)\big)\\
+ & \leq \frac{1}{C\cdot C_\mu}\left(\frac{2e}{e-1}\big(3 V(S_M)
+ + 2 V(i^*)\big)
+ + V(i^*)\right)
+ \end{align*}
+
+ Thus:
+ \begin{align*}
+ V(i^*) \leq \frac{6e}{C\cdot C_\mu(e-1)- 5e + 1} V(S_M)
+ \end{align*}
+
+ Finally, using again that:
+ \begin{displaymath}
+ OPT(V,\mathcal{N},B) \leq \frac{e}{e-1}\big(3 V(S_M) + 2 V(i^*)\big)
+ \end{displaymath}
+
+ We get:
+ \begin{displaymath}
+ OPT(V, \mathcal{N}, B) \leq \frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
+ C_\mu(e-1) -5e +1}\right) V(S_M)
+ \end{displaymath}
+\end{proof}
+
+The optimal value for $C$ is:
+\begin{displaymath}
+ C^* = \arg\min C_{\textrm{max}}
+\end{displaymath}
+
+This equation has two solutions. Only one of those is such that:
+\begin{displaymath}
+ C\cdot C_\mu(e-1) -5e +1 \geq 0
+\end{displaymath}
+which is needed in the proof of the previous lemma. Computing this solution,
+we can state the main result of this section.
+
+\subsection{Relaxations of the value function}
+
+To prove lemma TODO, we will use a general method called pipage rounding,
+introduced in TODO. Two consecutive relaxations are used: the one that we are
+interested in, whose optimization can be computed efficiently, and the
+multilinear extension which presents a \emph{cross-convexity} like behavior
+which allows for rounding of fractional solution without decreasing the value
+of the objective function and thus ensures a constant approximation of the
+value function. The difficulty resides in showing that the ratio of the two
+relaxations is bounded.
+
+We say that $R_\mathcal{N}:[0,1]^n\rightarrow\mathbf{R}$ is a relaxation of the
+value function $V$ over $\mathcal{N}$ if it coincides with $V$ at binary
+points. Formally, for any $S\subset\mathcal{N}$, let $\mathbf{1}_S$ denote the
+indicator vector of $S$. $R_\mathcal{N}$ is a relaxation of $V$ over
+$\mathcal{N}$ iff:
+\begin{displaymath}
+ \forall S\subset\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S)
+\end{displaymath}
+
+We can extend the optimisation problem defined above to a relaxation by
+extending the cost function:
+\begin{displaymath}
+ \forall \lambda\in[0,1]^n,\; c(\lambda)
+ = \sum_{i\in\mathcal{N}}\lambda_ic_i
+\end{displaymath}
+The optimisation problem becomes:
+\begin{displaymath}
+ OPT(R_\mathcal{N}, B) =
+ \max_{\lambda\in[0,1]^n}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\}
+\end{displaymath}
+The relaxations we will consider here rely on defining a probability
+distribution over subsets of $\mathcal{N}$.
+
+Let $\lambda\in[0,1]^n$, let us define:
+\begin{displaymath}
+ P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i
+ \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i)
+\end{displaymath}
+$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select
+a subset of $\mathcal{N}$ at random by deciding independently for each point to
+include it in the set with probability $\lambda_i$ (and to exclude it with
+probability $1-\lambda_i$).
+
+We will consider two relaxations of the value function $V$ over $\mathcal{N}$:
+\begin{itemize}
+ \item the \emph{multi-linear extension} of $V$:
+ \begin{align*}
+ F_\mathcal{N}(\lambda)
+ & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\
+ & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\
+ & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\
+ \end{align*}
+ \item the \emph{concave relaxation} of $V$:
+ \begin{align*}
+ L_{\mathcal{N}}(\lambda)
+ & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\
+ & = \log\det\left(\sum_{S\subset N}
+ P_\mathcal{N}^\lambda(S)A(S)\right)\\
+ & = \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}}
+ \lambda_ix_ix_i^*\right)\\
+ & \defeq \log\det \tilde{A}(\lambda)
+ \end{align*}
+\end{itemize}
+
+\begin{lemma}
+ The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence
+ this relaxation is well-named!}.
+\end{lemma}
+
+\begin{proof}
+ This follows from the concavity of the $\log\det$ function over symmetric
+ positive semi-definite matrices. More precisely, if $A$ and $B$ are two
+ symmetric positive semi-definite matrices, then:
+ \begin{multline*}
+ \forall\alpha\in [0, 1],\; \log\det\big(\alpha A + (1-\alpha) B\big)\\
+ \geq \alpha\log\det A + (1-\alpha)\log\det B
+ \end{multline*}
+\end{proof}
+
+\begin{lemma}[Rounding]\label{lemma:rounding}
+ For any feasible $\lambda\in[0,1]^n$, there exists a feasible
+ $\bar{\lambda}\in[0,1]^n$ such that at most one of its component is
+ fractional, that is, lies in $(0,1)$ and:
+ \begin{displaymath}
+ F_{\mathcal{N}}(\lambda)\leq F_{\mathcal{N}}(\bar{\lambda})
+ \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+ We give a rounding procedure which given a feasible $\lambda$ with at least
+ two fractional components, returns some $\lambda'$ with one less fractional
+ component, feasible such that:
+ \begin{displaymath}
+ F_\mathcal{N}(\lambda) \leq F_\mathcal{N}(\lambda')
+ \end{displaymath}
+ Applying this procedure recursively yields the lemma's result.
+
+ Let us consider such a feasible $\lambda$. Let $i$ and $j$ be two
+ fractional components of $\lambda$ and let us define the following
+ function:
+ \begin{displaymath}
+ F_\lambda(\varepsilon) = F(\lambda_\varepsilon)
+ \quad\textrm{where} \quad
+ \lambda_\varepsilon = \lambda + \varepsilon\left(e_i-\frac{c_i}{c_j}e_j\right)
+ \end{displaymath}
+
+ It is easy to see that if $\lambda$ is feasible, then:
+ \begin{multline}\label{eq:convex-interval}
+ \forall\varepsilon\in\Big[\max\Big(-\lambda_i,(\lambda_j-1)\frac{c_j}{c_i}\Big), \min\Big(1-\lambda_i, \lambda_j
+ \frac{c_j}{c_i}\Big)\Big],\;\\
+ \lambda_\varepsilon\;\;\textrm{is feasible}
+ \end{multline}
+
+ Furthermore, the function $F_\lambda$ is convex, indeed:
+ \begin{align*}
+ F_\lambda(\varepsilon)
+ & = \mathbb{E}_{S'\sim P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
+ (\lambda_i+\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i,j\})\\
+ & + (\lambda_i+\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i\})\\
+ & + (1-\lambda_i-\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{j\})\\
+ & + (1-\lambda_i-\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S')\Big]\\
+ \end{align*}
+ Thus, $F_\lambda$ is a degree 2 polynomial whose dominant coefficient is:
+ \begin{multline*}
+ \frac{c_i}{c_j}\mathbb{E}_{S'\sim
+ P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
+ V(S'\cup\{i\})+V(S'\cup\{i\})\\
+ -V(S'\cup\{i,j\})-V(S')\Big]
+ \end{multline*}
+ which is positive by submodularity of $V$. Hence, the maximum of
+ $F_\lambda$ over the interval given in \eqref{eq:convex-interval} is
+ attained at one of its limit, at which either the $i$-th or $j$-th component of
+ $\lambda_\varepsilon$ becomes integral.
+\end{proof}
+
+\begin{lemma}\label{lemma:relaxation-ratio}
+ The following inequality holds:
+ \begin{displaymath}
+ \forall\lambda\in[0,1]^n,\;
+ \frac{\log\big(1+\mu\big)}{2\mu}
+ \,L_\mathcal{N}(\lambda)\leq
+ F_\mathcal{N}(\lambda)\leq L_{\mathcal{N}}(\lambda)
+ \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+
+ We will prove that:
+ \begin{displaymath}
+ \frac{\log\big(1+\mu\big)}{2\mu}
+ \end{displaymath}
+ is a lower bound of the ratio $\partial_i F_\mathcal{N}(\lambda)/\partial_i
+ L_\mathcal{N}(\lambda)$.
+
+ This will be enough to conclude, by observing that:
+ \begin{displaymath}
+ \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
+ \sim_{\lambda\rightarrow 0}
+ \frac{\sum_{i\in \mathcal{N}}\lambda_i\partial_i F_\mathcal{N}(0)}
+ {\sum_{i\in\mathcal{N}}\lambda_i\partial_i L_\mathcal{N}(0)}
+ \end{displaymath}
+ and that an interior critical point of the ratio
+ $F_\mathcal{N}(\lambda)/L_\mathcal{N}(\lambda)$ is defined by:
+ \begin{displaymath}
+ \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
+ = \frac{\partial_i F_\mathcal{N}(\lambda)}{\partial_i
+ L_\mathcal{N}(\lambda)}
+ \end{displaymath}
+
+ Let us start by computing the derivatives of $F_\mathcal{N}$ and
+ $L_\mathcal{N}$ with respect to
+ the $i$-th component.
+
+ For $F$, it suffices to look at the derivative of
+ $P_\mathcal{N}^\lambda(S)$:
+ \begin{displaymath}
+ \partial_i P_\mathcal{N}^\lambda(S) = \left\{
+ \begin{aligned}
+ & P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})\;\textrm{if}\; i\in S \\
+ & - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\;\textrm{if}\;
+ i\in \mathcal{N}\setminus S \\
+ \end{aligned}\right.
+ \end{displaymath}
+
+ Hence:
+ \begin{multline*}
+ \partial_i F_\mathcal{N} =
+ \sum_{\substack{S\subset\mathcal{N}\\ i\in S}}
+ P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)\\
+ - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
+ P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
+ \end{multline*}
+
+ Now, using that every $S$ such that $i\in S$ can be uniquely written as
+ $S'\cup\{i\}$, we can write:
+ \begin{multline*}
+ \partial_i F_\mathcal{N} =
+ \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+ P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S\cup\{i\})\\
+ - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
+ P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
+ \end{multline*}
+
+ Finally, by using the expression for the marginal contribution of $i$ to
+ $S$:
+ \begin{displaymath}
+ \partial_i F_\mathcal{N}(\lambda) =
+ \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+ P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
+ \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)
+ \end{displaymath}
+
+ The computation of the derivative of $L_\mathcal{N}$ uses standard matrix
+ calculus and gives:
+ \begin{displaymath}
+ \partial_i L_\mathcal{N}(\lambda)
+ = \mu x_i^* \tilde{A}(\lambda)^{-1}x_i
+ \end{displaymath}
+
+ Using the following inequalities:
+ \begin{gather*}
+ \forall S\subset\mathcal{N}\setminus\{i\},\quad
+ P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\geq
+ P_{\mathcal{N}\setminus\{i\}}^\lambda(S\cup\{i\})\\
+ \forall S\subset\mathcal{N},\quad P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
+ \geq P_\mathcal{N}^\lambda(S)\\
+ \forall S\subset\mathcal{N},\quad A(S)^{-1} \geq A(S\cup\{i\})^{-1}\\
+ \end{gather*}
+ we get:
+ \begin{align*}
+ \partial_i F_\mathcal{N}(\lambda)
+ & \geq \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+ P_\mathcal{N}^\lambda(S)
+ \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
+ & \geq \frac{1}{2}
+ \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+ P_\mathcal{N}^\lambda(S)
+ \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
+ &\hspace{-3.5em}+\frac{1}{2}
+ \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
+ P_\mathcal{N}^\lambda(S\cup\{i\})
+ \log\Big(1 + \mu x_i^*A(S\cup\{i\})^{-1}x_i\Big)\\
+ &\geq \frac{1}{2}
+ \sum_{S\subset\mathcal{N}}
+ P_\mathcal{N}^\lambda(S)
+ \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
+ \end{align*}
+
+ Using that $A(S)\geq I_d$ we get that:
+ \begin{displaymath}
+ \mu x_i^*A(S)^{-1}x_i \leq \mu
+ \end{displaymath}
+
+ Moreover:
+ \begin{displaymath}
+ \forall x\leq\mu,\; \log(1+x)\geq
+ \frac{\log\big(1+\mu\big)}{\mu} x
+ \end{displaymath}
+
+ Hence:
+ \begin{displaymath}
+ \partial_i F_\mathcal{N}(\lambda) \geq
+ \frac{\log\big(1+\mu\big)}{2\mu}
+ x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)^{-1}\bigg)x_i
+ \end{displaymath}
+
+ Finally, using that the inverse is a matrix convex function over symmetric
+ positive definite matrices:
+ \begin{align*}
+ \partial_i F_\mathcal{N}(\lambda) &\geq
+ \frac{\log\big(1+\mu\big)}{2\mu}
+ x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\
+ & \geq \frac{\log\big(1+\mu\big)}{2\mu}
+ \partial_i L_\mathcal{N}(\lambda)
+ \end{align*}
+\end{proof}
+
+We can now prove lemma TODO from previous section.
+
+\begin{proof}
+ Let us consider a feasible point $\lambda^*\in[0,1]^n$ such that $L_\mathcal{N}(\lambda^*)
+ = OPT(L_\mathcal{N}, B)$. By applying lemma~\ref{lemma:relaxation-ratio}
+ and lemma~\ref{lemma:rounding} we get a feasible point $\bar{\lambda}$ with at most
+ one fractional component such that:
+ \begin{equation}\label{eq:e1}
+ L_\mathcal{N}(\lambda^*) \leq \frac{1}{C_\mu}
+ F_\mathcal{N}(\bar{\lambda})
+ \end{equation}
+
+ Let $\lambda_i$ denote the fractional component of $\bar{\lambda}$ and $S$
+ denote the set whose indicator vector is $\bar{\lambda} - \lambda_i e_i$.
+ Using the fact that $F_\mathcal{N}$ is linear with respect to the $i$-th
+ component and is a relaxation of the value function, we get:
+ \begin{displaymath}
+ F_\mathcal{N}(\bar{\lambda}) = V(S) +\lambda_i V(S\cup\{i\})
+ \end{displaymath}
+
+ Using the submodularity of $V$:
+ \begin{displaymath}
+ F_\mathcal{N}(\bar{\lambda}) \leq 2 V(S) + V(i)
+ \end{displaymath}
+
+ Note that since $\bar{\lambda}$ is feasible, $S$ is also feasible and
+ $V(S)\leq OPT(V,\mathcal{N}, B)$. Hence:
+ \begin{equation}\label{eq:e2}
+ F_\mathcal{N}(\bar{\lambda}) \leq 2 OPT(V,\mathcal{N}, B)
+ + \max_{i\in\mathcal{N}} V(i)
+ \end{equation}
+
+ Putting \eqref{eq:e1} and \eqref{eq:e2} together gives the results.
+\end{proof}
+
+
diff --git a/paper.tex b/paper.tex
index 1484c3d..fd02b24 100644
--- a/paper.tex
+++ b/paper.tex
@@ -3,789 +3,21 @@
\usepackage{amsmath,amsfonts}
\usepackage{algorithm}
\usepackage{algpseudocode}
-\newtheorem{lemma}{Lemma}
-\newtheorem{fact}{Fact}
-\newtheorem{example}{Example}
-\newtheorem{prop}{Proposition}
-\newtheorem{theorem}{Theorem}
-\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
-\newcommand{\var}{\mathop{\mathrm{Var}}}
-\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
-\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
-\newcommand{\norm}[1]{\lVert#1\rVert}
-\newcommand{\tr}[1]{#1^*}
-\newcommand{\ip}[2]{\langle #1, #2 \rangle}
-\newcommand{\mse}{\mathop{\mathrm{MSE}}}
-\DeclareMathOperator{\trace}{tr}
-\DeclareMathOperator*{\argmax}{arg\,max}
+\input{definitions}
\title{Budgeted Auctions for Experiment Design}
\begin{document}
\maketitle
\section{Intro}
-
-\begin{itemize}
- \item already existing field of experiment design: survey-like setup, what
- are the best points to include in your experiment? Measure of the
- usefulness of the data: variance-reduction or entropy-reduction.
- \item nowadays, there is also a big focus on purchasing data: paid surveys,
- mechanical turk, etc. that add economic aspects to the problem of
- experiment design
- \item recent advances (Singer, Chen) in the field of budgeted mechanisms
- \item we study ridge regression, very widely used in statistical learning,
- and treat it as a problem of budgeted experiment design
- \item we make the following contributions: ...
- \item extension to a more general setup which includes a wider class of
- machine learning problems
-\end{itemize}
+\input{intro}
\section{Problem formulation}
-
-\subsection{Notations}
-
-Throughout the paper, we will make use of the following notations: if $x$ is
-a (column) vector in $\mathbf{R}^d$, $x^*$ denotes its transposed (line)
-vector. Thus, the standard inner product between two vectors $x$ and $y$ is
-simply $x^* y$. $\norm{x}_2 = x^*x$ will denote the $L_2$ norm of $x$.
-
-We will also often use the following order over symmetric matrices: if $A$ and
-$B$ are two $d\times d$ and $B$ are two $d\times d$ real symmetric matrices, we
-write that $A\leq B$ iff:
-\begin{displaymath}
- \forall x\in\mathbf{R}^d,\quad
- x^*Ax \leq x^*Bx
-\end{displaymath}
-That is, iff $B-A$ is symmetric semi-definite positive.
-
-This order let us define the notion of a \emph{decreasing} or \emph{convex}
-matrix function similarly to their real counterparts. In particular, let us
-recall that the matrix inversion is decreasing and convex over symmetric
-definite positive matrices.
-
-\subsection{Data model}
-
-There is a set of $n$ users, $\mathcal{N} = \{1,\ldots, n\}$. Each user
-$i\in\mathcal{N}$ has a public vector of features $x_i\in\mathbf{R}^d$ and an
-undisclosed piece of information $y_i\in\mathbf{R}$. We assume that the data
-has already been normalized so that $\norm{x_i}_2\leq 1$ for all
-$i\in\mathcal{N}$.
-
-The experimenter is going to select a set of users and ask them to reveal their
-private piece of information. We are interested in a \emph{survey setup}: the
-experimenter has not seen the data yet, but he wants to know which users he
-should be selecting. His goal is to learn the model underlying the data. Here,
-we assume a linear model:
-\begin{displaymath}
- \forall i\in\mathcal{N},\quad y_i = \beta^* x_i + \varepsilon_i
-\end{displaymath}
-where $\beta\in\mathbf{R}^d$ and $\varepsilon_i\in\mathbf{R}$ follows a normal
-distribution of mean $0$ and variance $\sigma^2$. Furthermore, we assume the
-error $\varepsilon$ to be independent of the user:
-$(\varepsilon_i)_{i\in\mathcal{N}}$ are mutually independent.
-
-After observing the data, the experimenter could simply do linear regression to
-learn the model parameter $\beta$. However, in a more general setup, the
-experimenter has a prior knowledge about $\beta$, a distribution over
-$\mathbf{R}^d$. After observing the data, the experimenter performs
-\emph{maximum a posteriori estimation}: computing the point which maximizes the
-posterior distribution of $\beta$ given the observations.
-
-Here, we will assume, as it is often done, that the prior distribution is
-a multivariate normal distribution of mean zero and covariance matrix $\kappa
-I_d$. Maximum a posteriori estimation leads to the following maximization
-problem:
-\begin{displaymath}
- \beta_{\text{max}} = \argmax_{\beta\in\mathbf{R}^d} \sum_i (y_i - \beta^*x_i)^2
- + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
-\end{displaymath}
-which is the well-known \emph{ridge regression}. $\mu
-= \frac{\kappa}{\sigma^2}$ is the regularization parameter. Ridge regression
-can thus be seen as linear regression with a regularization term which
-prevents $\beta$ from having a large $L_2$-norm.
-
-\subsection{Value of data}
-
-Because the user private variables $y_i$ have not been observed yet when the
-experimenter has to decide which users to include in his experiment, we treat
-$\beta$ as a random variable whose distribution is updated after observing the
-data.
-
-Let us recall that if $\beta$ is random variable over $\mathbf{R}^d$ whose
-probability distribution has a density function $f$ with respect to the
-Lebesgue measure, its entropy is given by:
-\begin{displaymath}
- \mathbb{H}(\beta) \defeq - \int_{b\in\mathbf{R}^d} \log f(b) f(b)\text{d}b
-\end{displaymath}
-
-A usual way to measure the decrease of uncertainty induced by the observation
-of data is to use the entropy. This leads to the following definition of the
-value of data called the \emph{value of information}:
-\begin{displaymath}
- \forall S\subset\mathcal{N},\quad V(S) = \mathbb{H}(\beta)
- - \mathbb{H}(\beta\,|\,
- Y_S)
-\end{displaymath}
-where $Y_S = \{y_i,\,i\in S\}$ is the set of observed data.
-
-\begin{theorem}
- Under the ridge regression model explained in section TODO, the value of data
- is equal to:
- \begin{align*}
- \forall S\subset\mathcal{N},\; V(S)
- & = \frac{1}{2}\log\det\left(I_d
- + \mu\sum_{i\in S} x_ix_i^*\right)\\
- & \defeq \frac{1}{2}\log\det A(S)
- \end{align*}
-\end{theorem}
-
-\begin{proof}
-
-Let us denote by $X_S$ the matrix whose rows are the vectors $(x_i^*)_{i\in
-S}$. Observe that $A_S$ can simply be written as:
-\begin{displaymath}
- A_S = I_d + \mu X_S^* X_S
-\end{displaymath}
-
-Let us recall that the entropy of a multivariate normal variable $B$ over
-$\mathbf{R}^d$ of covariance $\Sigma I_d$ is given by:
-\begin{equation}\label{eq:multivariate-entropy}
- \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma I_d\big)
-\end{equation}
-
-Using the chain rule for conditional entropy, we get that:
-\begin{displaymath}
- V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta)
-\end{displaymath}
-
-Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal
-distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence:
-\begin{equation}\label{eq:h1}
- \mathbb{H}(Y_S\,|\,\beta)
- = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right)
-\end{equation}
-
-$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us
-compute its covariance matrix, $\Sigma_Y$:
-\begin{align*}
- \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\
- & = \kappa X_S X_S^* + \sigma^2I_n
-\end{align*}
-Thus, we get that:
-\begin{equation}\label{eq:h2}
- \mathbb{H}(Y_S)
- = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S X_S^* + \sigma^2 I_n)\right)
-\end{equation}
-
-Combining \eqref{eq:h1} and \eqref{eq:h2} we get:
-\begin{displaymath}
- V(S) = \frac{1}{2}\log\det\left(I_n+\frac{\kappa}{\sigma^2}X_S
- X_S^*\right)
-\end{displaymath}
-
-Finally, we can use Sylvester's determinant theorem to get the result.
-\end{proof}
-
-It is also interesting to look at the marginal contribution of a user to a set: the
-increase of value induced by adding a user to an already existing set of users.
-We have the following lemma.
-
-\begin{lemma}[Marginal contribution]
- \begin{displaymath}
- \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S)
- = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right)
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
- We have:
- \begin{align*}
- V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\
- & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\
- & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i
- x_i^*\right)\\
- & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right)
- \end{align*}
- where the last equality comes from Sylvester's determinant formula.
-\end{proof}
-
-Because $A(S)$ is symmetric definite positive, the marginal contribution is
-positive, which proves that the value function is set increasing. Furthermore,
-it is easy to see that if $S\subset S'$, then $A(S)\leq A(S')$. Using the fact
-that matrix inversion is decreasing, we see that the marginal contribution of
-a fixed user is a set decreasing function. This is the \emph{submodularity} of
-the value function.
-
-TODO? Explain what are the points which are the most valuable : points which
-are aligned along directions where the spread over the already existing points
-is small.
-
-\subsection{Auction}
-
-TODO Explain the optimization problem, why it has to be formulated as an auction
-problem. Explain the goals:
-\begin{itemize}
- \item truthful
- \item individually rational
- \item budget feasible
- \item has a good approximation ratio
-
-TODO Explain what is already known: it is ok when the function is submodular. When
-should we introduce the notion of submodularity?
-\end{itemize}
-
+\input{problem}
\section{Main result}
-
-All the budget feasible mechanisms studied recently (TODO ref Singer Chen)
-rely on using a greedy heuristic extending the idea of the greedy heuristic for
-the knapsack problem. In knapsack, objects are selected based on their
-\emph{value-per-cost} ratio. The quantity which plays a similar role for
-general submodular functions is the \emph{marginal-contribution-per-cost}
-ratio: let us assume that you have already selected a set of points $S$, then
-the \emph{marginal-contribution-per-cost} ratio per cost of a new point $i$ is
-defined by:
-\begin{displaymath}
- \frac{V(S\cup\{i\}) - V(S)}{c_i}
-\end{displaymath}
-
-The greedy heuristic then simply repeatedly selects the point whose
-marginal-contribution-per-cost ratio is the highest until it reaches the budget
-limit. Mechanism considerations aside, this is known to have an unbounded
-approximation ratio. However, lemma TODO ref in Chen or Singer shows that the
-maximum between the greedy heuristic and the point with maximum value (as
-a singleton set) provides a $\frac{5e}{e-1}$ approximation ratio.
-
-Unfortunately, TODO Singer 2011 points out that taking the maximum between the
-greedy heuristic and the most valuable point is not incentive compatible.
-Singer and Chen tackle this issue similarly: instead of comparing the most
-valuable point to the greedy solution, they compare it to a solution which is
-close enough to keep a constant approximation ratio:
-\begin{itemize}
- \item Chen suggests using $OPT(V,\mathcal{N}\setminus\{i\}, B)$.
- Unfortunately, in the general case, this cannot be computed exactly in
- polynomial time.
- \item Singer uses using the optimal value of a relaxed objective function
- which can be proven to be close to the optimal of the original
- objective function. The function used is tailored to the specific
- problem of coverage.
-\end{itemize}
-
-Here, we use a relaxation of the objective function which is tailored to the
-problem of ridge regression. We define:
-\begin{displaymath}
- \forall\lambda\in[0,1]^{|\mathcal{N}|}\,\quad L_{\mathcal{N}}(\lambda) \defeq
- \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}} \lambda_i x_i x_i^*\right)
-\end{displaymath}
-
-We can now present the mechanism we use, which has a same flavor to Chen's and
-Singer's
-
-\begin{algorithm}\label{mechanism}
- \caption{Mechanism for ridge regression}
- \begin{algorithmic}[1]
- \State $i^* \gets \argmax_{j\in\mathcal{N}}V(j)$
- \State $x^* \gets \argmax_{x\in[0,1]^n} \{L_{\mathcal{N}\setminus\{i^*\}}(x)
- \,|\, c(x)\leq B\}$
- \Statex
- \If{$L(x^*) < CV(i^*)$}
- \State \textbf{return} $\{i^*\}$
- \Else
- \State $i \gets \argmax_{1\leq j\leq n}\frac{V(j)}{c_j}$
- \State $S \gets \emptyset$
- \While{$c_i\leq \frac{B}{2}\frac{V(S\cup\{i\})-V(S)}{V(S\cup\{i\})}$}
- \State $S \gets S\cup\{i\}$
- \State $i \gets \argmax_{j\in\mathcal{N}\setminus S}
- \frac{V(S\cup\{j\})-V(S)}{c_j}$
- \EndWhile
- \State \textbf{return} $S$
- \EndIf
- \end{algorithmic}
-\end{algorithm}
-
-Notice, that the stopping condition in the while loop is more sophisticated
-than just ensuring that the sum of the costs does not exceed the budget. This
-is because the selected users will be payed more than their costs, and this
-stopping condition ensures budget feasibility when the users are paid their
-threshold payment.
-
-We can now state the main result of this section:
-\begin{theorem}
- The mechanism in \ref{mechanism} is truthful, individually rational, budget
- feasible. Furthermore, choosing:
- \begin{multline*}
- C = C^* = \frac{5e-1 + C_\mu(2e+1)}{2C_\mu(e-1)}\\
- + \frac{\sqrt{C_\mu^2(1+2e)^2
- + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
- \end{multline*}
- we get an approximation ratio of:
- \begin{multline*}
- 1 + C^* = \frac{5e-1 + C_\mu(4e-1)}{2C_\mu(e-1)}\\
- + \frac{\sqrt{C_\mu^2(1+2e)^2
- + 2C_\mu(14e^2+5e+1) + (1-5e)^2}}{2C_\mu(e-1)}
- \end{multline*}
- where:
- \begin{displaymath}
- C_\mu = \frac{\log(1+\mu)}{2\mu}
- \end{displaymath}
-\end{theorem}
-
-The proof will consist of the claims of the theorem broken down into lemmas.
-
-Because the user strategy is parametrized by a single parameter, truthfulness
-is equivalent to monotonicity (TODO ref). The proof here is the same as in TODO
-and is given for the sake of completeness.
-
-\begin{lemma}
-The mechanism is monotone.
-\end{lemma}
-
-\begin{proof}
- We assume by contradiction that there exists a user $i$ that has been
- selected by the mechanism and that would not be selected had he reported
- a cost $c_i'\leq c_i$ (all the other costs staying the same).
-
- If $i\neq i^*$ and $i$ has been selected, then we are in the case where
- $L(x^*) \geq C V(i^*)$ and $i$ was included in the result set by the greedy
- part of the mechanism. By reporting a cost $c_i'\leq c_i$, using the
- submodularity of $V$, we see that $i$ will satisfy the greedy selection
- rule:
- \begin{displaymath}
- i = \argmax_{j\in\mathcal{N}\setminus S} \frac{V(S\cup\{j\})
- - V(S)}{c_j}
- \end{displaymath}
- in an earlier iteration of the greedy heuristic. Let us denote by $S_i$
- (resp. $S_i'$) the set to which $i$ is added when reporting cost $c_i$
- (resp. $c_i'$). We have $S_i'\subset S_i$. Moreover:
- \begin{align*}
- c_i' & \leq c_i \leq
- \frac{B}{2}\frac{V(S_i\cup\{i\})-V(S_i)}{V(S_i\cup\{i\})}\\
- & \leq \frac{B}{2}\frac{V(S_i'\cup\{i\})-V(S_i')}{V(S_i'\cup\{i\})}
- \end{align*}
- Hence $i$ will still be included in the result set.
-
- If $i = i^*$, $i$ is included iff $L(x^*) \leq C V(i^*)$. Reporting $c_i'$
- instead of $c_i$ does not change the value $V(i^*)$ nor $L(x^*)$ (which is
- computed over $\mathcal{N}\setminus\{i^*\}$). Thus $i$ is still included by
- reporting a different cost.
-\end{proof}
-
-\begin{lemma}
-The mechanism is budget feasible.
-\end{lemma}
-
-The proof is the same as in Chen and is given here for the sake of
-completeness.
-\begin{proof}
-
-\end{proof}
-
-Using the characterization of the threshold payments from Singer TODO, we can
-prove individual rationality similarly to Chen TODO.
-\begin{lemma}
-The mechanism is individually rational
-\end{lemma}
-
-\begin{proof}
-
-\end{proof}
-
-The following lemma requires to have a careful look at the relaxation function
-we chose in the mechanism. The next section will be dedicated to studying this
-relaxation and will contain the proof of the following lemma:
-\begin{lemma}
- We have:
- \begin{displaymath}
- OPT(L_\mathcal{N}, B) \leq \frac{1}{C_\mu}\big(2 OPT(V,\mathcal{N},B)
- + \max_{i\in\mathcal{N}}V(i)\big)
- \end{displaymath}
-\end{lemma}
-
-\begin{lemma}
- Let us denote by $S_M$ the set returned by the mechanism. Let us also
- write:
- \begin{displaymath}
- C_{\textrm{max}} = \max\left(1+C,\frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
- C_\mu(e-1) -5e +1}\right)\right)
- \end{displaymath}
-
- Then:
- \begin{displaymath}
- OPT(V, \mathcal{N}, B) \leq
- C_\text{max}\cdot V(S_M)
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
- If the condition on line 3 of the algorithm holds, then:
- \begin{displaymath}
- V(i^*) \geq \frac{1}{C}L(x^*) \geq
- \frac{1}{C}OPT(V,\mathcal{N}\setminus\{i\}, B)
- \end{displaymath}
-
- But:
- \begin{displaymath}
- OPT(V,\mathcal{N},B) \leq OPT(V,\mathcal{N}\setminus\{i\}, B) + V(i^*)
- \end{displaymath}
-
- Hence:
- \begin{displaymath}
- V(i^*) \geq \frac{1}{C+1} OPT(V,\mathcal{N}, B)
- \end{displaymath}
-
- If the condition of the algorithm does not hold:
- \begin{align*}
- V(i^*) & \leq \frac{1}{C}L(x^*) \leq \frac{1}{C\cdot C_\mu}
- \big(2 OPT(V,\mathcal{N}, B) + V(i^*)\big)\\
- & \leq \frac{1}{C\cdot C_\mu}\left(\frac{2e}{e-1}\big(3 V(S_M)
- + 2 V(i^*)\big)
- + V(i^*)\right)
- \end{align*}
-
- Thus:
- \begin{align*}
- V(i^*) \leq \frac{6e}{C\cdot C_\mu(e-1)- 5e + 1} V(S_M)
- \end{align*}
-
- Finally, using again that:
- \begin{displaymath}
- OPT(V,\mathcal{N},B) \leq \frac{e}{e-1}\big(3 V(S_M) + 2 V(i^*)\big)
- \end{displaymath}
-
- We get:
- \begin{displaymath}
- OPT(V, \mathcal{N}, B) \leq \frac{e}{e-1}\left( 3 + \frac{12e}{C\cdot
- C_\mu(e-1) -5e +1}\right) V(S_M)
- \end{displaymath}
-\end{proof}
-
-The optimal value for $C$ is:
-\begin{displaymath}
- C^* = \arg\min C_{\textrm{max}}
-\end{displaymath}
-
-This equation has two solutions. Only one of those is such that:
-\begin{displaymath}
- C\cdot C_\mu(e-1) -5e +1 \geq 0
-\end{displaymath}
-which is needed in the proof of the previous lemma. Computing this solution,
-we can state the main result of this section.
-
-\subsection{Relaxations of the value function}
-
-To prove lemma TODO, we will use a general method called pipage rounding,
-introduced in TODO. Two consecutive relaxations are used: the one that we are
-interested in, whose optimization can be computed efficiently, and the
-multilinear extension which presents a \emph{cross-convexity} like behavior
-which allows for rounding of fractional solution without decreasing the value
-of the objective function and thus ensures a constant approximation of the
-value function. The difficulty resides in showing that the ratio of the two
-relaxations is bounded.
-
-We say that $R_\mathcal{N}:[0,1]^n\rightarrow\mathbf{R}$ is a relaxation of the
-value function $V$ over $\mathcal{N}$ if it coincides with $V$ at binary
-points. Formally, for any $S\subset\mathcal{N}$, let $\mathbf{1}_S$ denote the
-indicator vector of $S$. $R_\mathcal{N}$ is a relaxation of $V$ over
-$\mathcal{N}$ iff:
-\begin{displaymath}
- \forall S\subset\mathcal{N},\; R_\mathcal{N}(\mathbf{1}_S) = V(S)
-\end{displaymath}
-
-We can extend the optimisation problem defined above to a relaxation by
-extending the cost function:
-\begin{displaymath}
- \forall \lambda\in[0,1]^n,\; c(\lambda)
- = \sum_{i\in\mathcal{N}}\lambda_ic_i
-\end{displaymath}
-The optimisation problem becomes:
-\begin{displaymath}
- OPT(R_\mathcal{N}, B) =
- \max_{\lambda\in[0,1]^n}\left\{R_\mathcal{N}(\lambda)\,|\, c(\lambda)\leq B\right\}
-\end{displaymath}
-The relaxations we will consider here rely on defining a probability
-distribution over subsets of $\mathcal{N}$.
-
-Let $\lambda\in[0,1]^n$, let us define:
-\begin{displaymath}
- P_\mathcal{N}^\lambda(S) = \prod_{i\in S}\lambda_i
- \prod_{i\in\mathcal{N}\setminus S}(1-\lambda_i)
-\end{displaymath}
-$P_\mathcal{N}^\lambda(S)$ is the probability of picking the set $S$ if we select
-a subset of $\mathcal{N}$ at random by deciding independently for each point to
-include it in the set with probability $\lambda_i$ (and to exclude it with
-probability $1-\lambda_i$).
-
-We will consider two relaxations of the value function $V$ over $\mathcal{N}$:
-\begin{itemize}
- \item the \emph{multi-linear extension} of $V$:
- \begin{align*}
- F_\mathcal{N}(\lambda)
- & = \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[\log\det A(S)\big]\\
- & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) V(S)\\
- & = \sum_{S\subset\mathcal{N}} P_\mathcal{N}^\lambda(S) \log\det A(S)\\
- \end{align*}
- \item the \emph{concave relaxation} of $V$:
- \begin{align*}
- L_{\mathcal{N}}(\lambda)
- & = \log\det \mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\big[A(S)\big]\\
- & = \log\det\left(\sum_{S\subset N}
- P_\mathcal{N}^\lambda(S)A(S)\right)\\
- & = \log\det\left(I_d + \mu\sum_{i\in\mathcal{N}}
- \lambda_ix_ix_i^*\right)\\
- & \defeq \log\det \tilde{A}(\lambda)
- \end{align*}
-\end{itemize}
-
-\begin{lemma}
- The \emph{concave relaxation} $L_\mathcal{N}$ is concave\footnote{Hence
- this relaxation is well-named!}.
-\end{lemma}
-
-\begin{proof}
- This follows from the concavity of the $\log\det$ function over symmetric
- positive semi-definite matrices. More precisely, if $A$ and $B$ are two
- symmetric positive semi-definite matrices, then:
- \begin{multline*}
- \forall\alpha\in [0, 1],\; \log\det\big(\alpha A + (1-\alpha) B\big)\\
- \geq \alpha\log\det A + (1-\alpha)\log\det B
- \end{multline*}
-\end{proof}
-
-\begin{lemma}[Rounding]\label{lemma:rounding}
- For any feasible $\lambda\in[0,1]^n$, there exists a feasible
- $\bar{\lambda}\in[0,1]^n$ such that at most one of its component is
- fractional, that is, lies in $(0,1)$ and:
- \begin{displaymath}
- F_{\mathcal{N}}(\lambda)\leq F_{\mathcal{N}}(\bar{\lambda})
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
- We give a rounding procedure which given a feasible $\lambda$ with at least
- two fractional components, returns some $\lambda'$ with one less fractional
- component, feasible such that:
- \begin{displaymath}
- F_\mathcal{N}(\lambda) \leq F_\mathcal{N}(\lambda')
- \end{displaymath}
- Applying this procedure recursively yields the lemma's result.
-
- Let us consider such a feasible $\lambda$. Let $i$ and $j$ be two
- fractional components of $\lambda$ and let us define the following
- function:
- \begin{displaymath}
- F_\lambda(\varepsilon) = F(\lambda_\varepsilon)
- \quad\textrm{where} \quad
- \lambda_\varepsilon = \lambda + \varepsilon\left(e_i-\frac{c_i}{c_j}e_j\right)
- \end{displaymath}
-
- It is easy to see that if $\lambda$ is feasible, then:
- \begin{multline}\label{eq:convex-interval}
- \forall\varepsilon\in\Big[\max\Big(-\lambda_i,(\lambda_j-1)\frac{c_j}{c_i}\Big), \min\Big(1-\lambda_i, \lambda_j
- \frac{c_j}{c_i}\Big)\Big],\;\\
- \lambda_\varepsilon\;\;\textrm{is feasible}
- \end{multline}
-
- Furthermore, the function $F_\lambda$ is convex, indeed:
- \begin{align*}
- F_\lambda(\varepsilon)
- & = \mathbb{E}_{S'\sim P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
- (\lambda_i+\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i,j\})\\
- & + (\lambda_i+\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{i\})\\
- & + (1-\lambda_i-\varepsilon)\Big(\lambda_j-\varepsilon\frac{c_i}{c_j}\Big)V(S'\cup\{j\})\\
- & + (1-\lambda_i-\varepsilon)\Big(1-\lambda_j+\varepsilon\frac{c_i}{c_j}\Big)V(S')\Big]\\
- \end{align*}
- Thus, $F_\lambda$ is a degree 2 polynomial whose dominant coefficient is:
- \begin{multline*}
- \frac{c_i}{c_j}\mathbb{E}_{S'\sim
- P_{\mathcal{N}\setminus\{i,j\}}^\lambda(S')}\Big[
- V(S'\cup\{i\})+V(S'\cup\{i\})\\
- -V(S'\cup\{i,j\})-V(S')\Big]
- \end{multline*}
- which is positive by submodularity of $V$. Hence, the maximum of
- $F_\lambda$ over the interval given in \eqref{eq:convex-interval} is
- attained at one of its limit, at which either the $i$-th or $j$-th component of
- $\lambda_\varepsilon$ becomes integral.
-\end{proof}
-
-\begin{lemma}\label{lemma:relaxation-ratio}
- The following inequality holds:
- \begin{displaymath}
- \forall\lambda\in[0,1]^n,\;
- \frac{\log\big(1+\mu\big)}{2\mu}
- \,L_\mathcal{N}(\lambda)\leq
- F_\mathcal{N}(\lambda)\leq L_{\mathcal{N}}(\lambda)
- \end{displaymath}
-\end{lemma}
-
-\begin{proof}
-
- We will prove that:
- \begin{displaymath}
- \frac{\log\big(1+\mu\big)}{2\mu}
- \end{displaymath}
- is a lower bound of the ratio $\partial_i F_\mathcal{N}(\lambda)/\partial_i
- L_\mathcal{N}(\lambda)$.
-
- This will be enough to conclude, by observing that:
- \begin{displaymath}
- \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
- \sim_{\lambda\rightarrow 0}
- \frac{\sum_{i\in \mathcal{N}}\lambda_i\partial_i F_\mathcal{N}(0)}
- {\sum_{i\in\mathcal{N}}\lambda_i\partial_i L_\mathcal{N}(0)}
- \end{displaymath}
- and that an interior critical point of the ratio
- $F_\mathcal{N}(\lambda)/L_\mathcal{N}(\lambda)$ is defined by:
- \begin{displaymath}
- \frac{F_\mathcal{N}(\lambda)}{L_\mathcal{N}(\lambda)}
- = \frac{\partial_i F_\mathcal{N}(\lambda)}{\partial_i
- L_\mathcal{N}(\lambda)}
- \end{displaymath}
-
- Let us start by computing the derivatives of $F_\mathcal{N}$ and
- $L_\mathcal{N}$ with respect to
- the $i$-th component.
-
- For $F$, it suffices to look at the derivative of
- $P_\mathcal{N}^\lambda(S)$:
- \begin{displaymath}
- \partial_i P_\mathcal{N}^\lambda(S) = \left\{
- \begin{aligned}
- & P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})\;\textrm{if}\; i\in S \\
- & - P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\;\textrm{if}\;
- i\in \mathcal{N}\setminus S \\
- \end{aligned}\right.
- \end{displaymath}
-
- Hence:
- \begin{multline*}
- \partial_i F_\mathcal{N} =
- \sum_{\substack{S\subset\mathcal{N}\\ i\in S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S\setminus\{i\})V(S)\\
- - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
- \end{multline*}
-
- Now, using that every $S$ such that $i\in S$ can be uniquely written as
- $S'\cup\{i\}$, we can write:
- \begin{multline*}
- \partial_i F_\mathcal{N} =
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S\cup\{i\})\\
- - \sum_{\substack{S\subset\mathcal{N}\\ i\in \mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)V(S)\\
- \end{multline*}
-
- Finally, by using the expression for the marginal contribution of $i$ to
- $S$:
- \begin{displaymath}
- \partial_i F_\mathcal{N}(\lambda) =
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)
- \end{displaymath}
-
- The computation of the derivative of $L_\mathcal{N}$ uses standard matrix
- calculus and gives:
- \begin{displaymath}
- \partial_i L_\mathcal{N}(\lambda)
- = \mu x_i^* \tilde{A}(\lambda)^{-1}x_i
- \end{displaymath}
-
- Using the following inequalities:
- \begin{gather*}
- \forall S\subset\mathcal{N}\setminus\{i\},\quad
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S)\geq
- P_{\mathcal{N}\setminus\{i\}}^\lambda(S\cup\{i\})\\
- \forall S\subset\mathcal{N},\quad P_{\mathcal{N}\setminus\{i\}}^\lambda(S)
- \geq P_\mathcal{N}^\lambda(S)\\
- \forall S\subset\mathcal{N},\quad A(S)^{-1} \geq A(S\cup\{i\})^{-1}\\
- \end{gather*}
- we get:
- \begin{align*}
- \partial_i F_\mathcal{N}(\lambda)
- & \geq \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_\mathcal{N}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
- & \geq \frac{1}{2}
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_\mathcal{N}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
- &\hspace{-3.5em}+\frac{1}{2}
- \sum_{\substack{S\subset\mathcal{N}\\ i\in\mathcal{N}\setminus S}}
- P_\mathcal{N}^\lambda(S\cup\{i\})
- \log\Big(1 + \mu x_i^*A(S\cup\{i\})^{-1}x_i\Big)\\
- &\geq \frac{1}{2}
- \sum_{S\subset\mathcal{N}}
- P_\mathcal{N}^\lambda(S)
- \log\Big(1 + \mu x_i^*A(S)^{-1}x_i\Big)\\
- \end{align*}
-
- Using that $A(S)\geq I_d$ we get that:
- \begin{displaymath}
- \mu x_i^*A(S)^{-1}x_i \leq \mu
- \end{displaymath}
-
- Moreover:
- \begin{displaymath}
- \forall x\leq\mu,\; \log(1+x)\geq
- \frac{\log\big(1+\mu\big)}{\mu} x
- \end{displaymath}
-
- Hence:
- \begin{displaymath}
- \partial_i F_\mathcal{N}(\lambda) \geq
- \frac{\log\big(1+\mu\big)}{2\mu}
- x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)^{-1}\bigg)x_i
- \end{displaymath}
-
- Finally, using that the inverse is a matrix convex function over symmetric
- positive definite matrices:
- \begin{align*}
- \partial_i F_\mathcal{N}(\lambda) &\geq
- \frac{\log\big(1+\mu\big)}{2\mu}
- x_i^*\bigg(\sum_{S\subset\mathcal{N}}P_\mathcal{N}^\lambda(S)A(S)\bigg)^{-1}x_i\\
- & \geq \frac{\log\big(1+\mu\big)}{2\mu}
- \partial_i L_\mathcal{N}(\lambda)
- \end{align*}
-\end{proof}
-
-We can now prove lemma TODO from previous section.
-
-\begin{proof}
- Let us consider a feasible point $\lambda^*\in[0,1]^n$ such that $L_\mathcal{N}(\lambda^*)
- = OPT(L_\mathcal{N}, B)$. By applying lemma~\ref{lemma:relaxation-ratio}
- and lemma~\ref{lemma:rounding} we get a feasible point $\bar{\lambda}$ with at most
- one fractional component such that:
- \begin{equation}\label{eq:e1}
- L_\mathcal{N}(\lambda^*) \leq \frac{1}{C_\mu}
- F_\mathcal{N}(\bar{\lambda})
- \end{equation}
-
- Let $\lambda_i$ denote the fractional component of $\bar{\lambda}$ and $S$
- denote the set whose indicator vector is $\bar{\lambda} - \lambda_i e_i$.
- Using the fact that $F_\mathcal{N}$ is linear with respect to the $i$-th
- component and is a relaxation of the value function, we get:
- \begin{displaymath}
- F_\mathcal{N}(\bar{\lambda}) = V(S) +\lambda_i V(S\cup\{i\})
- \end{displaymath}
-
- Using the submodularity of $V$:
- \begin{displaymath}
- F_\mathcal{N}(\bar{\lambda}) \leq 2 V(S) + V(i)
- \end{displaymath}
-
- Note that since $\bar{\lambda}$ is feasible, $S$ is also feasible and
- $V(S)\leq OPT(V,\mathcal{N}, B)$. Hence:
- \begin{equation}\label{eq:e2}
- F_\mathcal{N}(\bar{\lambda}) \leq 2 OPT(V,\mathcal{N}, B)
- + \max_{i\in\mathcal{N}} V(i)
- \end{equation}
-
- Putting \eqref{eq:e1} and \eqref{eq:e2} together gives the results.
-\end{proof}
-
+\input{main}
\section{General setup}
-
+\input{general}
\section{Conclusion}
-
+\input{conclusion}
\end{document}
diff --git a/problem.tex b/problem.tex
new file mode 100644
index 0000000..fb9f8e1
--- /dev/null
+++ b/problem.tex
@@ -0,0 +1,193 @@
+\subsection{Notations}
+
+Throughout the paper, we will make use of the following notations: if $x$ is
+a (column) vector in $\mathbf{R}^d$, $x^*$ denotes its transposed (line)
+vector. Thus, the standard inner product between two vectors $x$ and $y$ is
+simply $x^* y$. $\norm{x}_2 = x^*x$ will denote the $L_2$ norm of $x$.
+
+We will also often use the following order over symmetric matrices: if $A$ and
+$B$ are two $d\times d$ and $B$ are two $d\times d$ real symmetric matrices, we
+write that $A\leq B$ iff:
+\begin{displaymath}
+ \forall x\in\mathbf{R}^d,\quad
+ x^*Ax \leq x^*Bx
+\end{displaymath}
+That is, iff $B-A$ is symmetric semi-definite positive.
+
+This order let us define the notion of a \emph{decreasing} or \emph{convex}
+matrix function similarly to their real counterparts. In particular, let us
+recall that the matrix inversion is decreasing and convex over symmetric
+definite positive matrices.
+
+\subsection{Data model}
+
+There is a set of $n$ users, $\mathcal{N} = \{1,\ldots, n\}$. Each user
+$i\in\mathcal{N}$ has a public vector of features $x_i\in\mathbf{R}^d$ and an
+undisclosed piece of information $y_i\in\mathbf{R}$. We assume that the data
+has already been normalized so that $\norm{x_i}_2\leq 1$ for all
+$i\in\mathcal{N}$.
+
+The experimenter is going to select a set of users and ask them to reveal their
+private piece of information. We are interested in a \emph{survey setup}: the
+experimenter has not seen the data yet, but he wants to know which users he
+should be selecting. His goal is to learn the model underlying the data. Here,
+we assume a linear model:
+\begin{displaymath}
+ \forall i\in\mathcal{N},\quad y_i = \beta^* x_i + \varepsilon_i
+\end{displaymath}
+where $\beta\in\mathbf{R}^d$ and $\varepsilon_i\in\mathbf{R}$ follows a normal
+distribution of mean $0$ and variance $\sigma^2$. Furthermore, we assume the
+error $\varepsilon$ to be independent of the user:
+$(\varepsilon_i)_{i\in\mathcal{N}}$ are mutually independent.
+
+After observing the data, the experimenter could simply do linear regression to
+learn the model parameter $\beta$. However, in a more general setup, the
+experimenter has a prior knowledge about $\beta$, a distribution over
+$\mathbf{R}^d$. After observing the data, the experimenter performs
+\emph{maximum a posteriori estimation}: computing the point which maximizes the
+posterior distribution of $\beta$ given the observations.
+
+Here, we will assume, as it is often done, that the prior distribution is
+a multivariate normal distribution of mean zero and covariance matrix $\kappa
+I_d$. Maximum a posteriori estimation leads to the following maximization
+problem:
+\begin{displaymath}
+ \beta_{\text{max}} = \argmax_{\beta\in\mathbf{R}^d} \sum_i (y_i - \beta^*x_i)^2
+ + \frac{1}{\mu}\sum_i \norm{\beta}_2^2
+\end{displaymath}
+which is the well-known \emph{ridge regression}. $\mu
+= \frac{\kappa}{\sigma^2}$ is the regularization parameter. Ridge regression
+can thus be seen as linear regression with a regularization term which
+prevents $\beta$ from having a large $L_2$-norm.
+
+\subsection{Value of data}
+
+Because the user private variables $y_i$ have not been observed yet when the
+experimenter has to decide which users to include in his experiment, we treat
+$\beta$ as a random variable whose distribution is updated after observing the
+data.
+
+Let us recall that if $\beta$ is random variable over $\mathbf{R}^d$ whose
+probability distribution has a density function $f$ with respect to the
+Lebesgue measure, its entropy is given by:
+\begin{displaymath}
+ \mathbb{H}(\beta) \defeq - \int_{b\in\mathbf{R}^d} \log f(b) f(b)\text{d}b
+\end{displaymath}
+
+A usual way to measure the decrease of uncertainty induced by the observation
+of data is to use the entropy. This leads to the following definition of the
+value of data called the \emph{value of information}:
+\begin{displaymath}
+ \forall S\subset\mathcal{N},\quad V(S) = \mathbb{H}(\beta)
+ - \mathbb{H}(\beta\,|\,
+ Y_S)
+\end{displaymath}
+where $Y_S = \{y_i,\,i\in S\}$ is the set of observed data.
+
+\begin{theorem}
+ Under the ridge regression model explained in section TODO, the value of data
+ is equal to:
+ \begin{align*}
+ \forall S\subset\mathcal{N},\; V(S)
+ & = \frac{1}{2}\log\det\left(I_d
+ + \mu\sum_{i\in S} x_ix_i^*\right)\\
+ & \defeq \frac{1}{2}\log\det A(S)
+ \end{align*}
+\end{theorem}
+
+\begin{proof}
+
+Let us denote by $X_S$ the matrix whose rows are the vectors $(x_i^*)_{i\in
+S}$. Observe that $A_S$ can simply be written as:
+\begin{displaymath}
+ A_S = I_d + \mu X_S^* X_S
+\end{displaymath}
+
+Let us recall that the entropy of a multivariate normal variable $B$ over
+$\mathbf{R}^d$ of covariance $\Sigma I_d$ is given by:
+\begin{equation}\label{eq:multivariate-entropy}
+ \mathbb{H}(B) = \frac{1}{2}\log\big((2\pi e)^d \det \Sigma I_d\big)
+\end{equation}
+
+Using the chain rule for conditional entropy, we get that:
+\begin{displaymath}
+ V(S) = \mathbb{H}(Y_S) - \mathbb{H}(Y_S\,|\,\beta)
+\end{displaymath}
+
+Conditioned on $\beta$, $(Y_S)$ follows a multivariate normal
+distribution of mean $X\beta$ and of covariance matrix $\sigma^2 I_n$. Hence:
+\begin{equation}\label{eq:h1}
+ \mathbb{H}(Y_S\,|\,\beta)
+ = \frac{1}{2}\log\left((2\pi e)^n \det(\sigma^2I_n)\right)
+\end{equation}
+
+$(Y_S)$ also follows a multivariate normal distribution of mean zero. Let us
+compute its covariance matrix, $\Sigma_Y$:
+\begin{align*}
+ \Sigma_Y & = \expt{YY^*} = \expt{(X_S\beta + E)(X_S\beta + E)^*}\\
+ & = \kappa X_S X_S^* + \sigma^2I_n
+\end{align*}
+Thus, we get that:
+\begin{equation}\label{eq:h2}
+ \mathbb{H}(Y_S)
+ = \frac{1}{2}\log\left((2\pi e)^n \det(\kappa X_S X_S^* + \sigma^2 I_n)\right)
+\end{equation}
+
+Combining \eqref{eq:h1} and \eqref{eq:h2} we get:
+\begin{displaymath}
+ V(S) = \frac{1}{2}\log\det\left(I_n+\frac{\kappa}{\sigma^2}X_S
+ X_S^*\right)
+\end{displaymath}
+
+Finally, we can use Sylvester's determinant theorem to get the result.
+\end{proof}
+
+It is also interesting to look at the marginal contribution of a user to a set: the
+increase of value induced by adding a user to an already existing set of users.
+We have the following lemma.
+
+\begin{lemma}[Marginal contribution]
+ \begin{displaymath}
+ \Delta_i V(S)\defeq V(S\cup\{i\}) - V(S)
+ = \frac{1}{2}\log\left(1 + \mu x_i^*A(S)^{-1}x_i\right)
+ \end{displaymath}
+\end{lemma}
+
+\begin{proof}
+ We have:
+ \begin{align*}
+ V(S\cup\{i\}) & = \frac{1}{2}\log\det A(S\cup\{i\})\\
+ & = \frac{1}{2}\log\det\left(A(S) + \mu x_i x_i^*\right)\\
+ & = V(S) + \frac{1}{2}\log\det\left(I_d + \mu A(S)^{-1}x_i
+ x_i^*\right)\\
+ & = V(S) + \frac{1}{2}\log\left(1 + \mu x_i^* A(S)^{-1}x_i\right)
+ \end{align*}
+ where the last equality comes from Sylvester's determinant formula.
+\end{proof}
+
+Because $A(S)$ is symmetric definite positive, the marginal contribution is
+positive, which proves that the value function is set increasing. Furthermore,
+it is easy to see that if $S\subset S'$, then $A(S)\leq A(S')$. Using the fact
+that matrix inversion is decreasing, we see that the marginal contribution of
+a fixed user is a set decreasing function. This is the \emph{submodularity} of
+the value function.
+
+TODO? Explain what are the points which are the most valuable : points which
+are aligned along directions where the spread over the already existing points
+is small.
+
+\subsection{Auction}
+
+TODO Explain the optimization problem, why it has to be formulated as an auction
+problem. Explain the goals:
+\begin{itemize}
+ \item truthful
+ \item individually rational
+ \item budget feasible
+ \item has a good approximation ratio
+
+TODO Explain what is already known: it is ok when the function is submodular. When
+should we introduce the notion of submodularity?
+\end{itemize}
+
+