summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorThibaut Horel <thibaut.horel@gmail.com>2013-09-22 15:46:14 -0400
committerThibaut Horel <thibaut.horel@gmail.com>2013-09-22 15:46:14 -0400
commitd0e9c3f41bd11a0bcb32fa13ecbcbb9ec9ea0041 (patch)
treead09c14f0b5ecad555ee76767bab6f25418c70df
parent50900bfc44961b87379cd2d3464b677d9f5be1ac (diff)
downloadrecommendation-d0e9c3f41bd11a0bcb32fa13ecbcbb9ec9ea0041.tar.gz
Fist reduction of the main part
-rw-r--r--abstract.tex6
-rw-r--r--appendix.tex67
-rw-r--r--approximation.tex90
-rw-r--r--general.tex2
-rw-r--r--intro.tex15
-rw-r--r--paper.tex2
-rw-r--r--problem.tex59
7 files changed, 119 insertions, 122 deletions
diff --git a/abstract.tex b/abstract.tex
index 9e99fcd..4cff1cd 100644
--- a/abstract.tex
+++ b/abstract.tex
@@ -1,5 +1,5 @@
%We initiate the study of mechanisms for \emph{experimental design}.
-
+\begin{comment}
In the classical {\em experimental design} setting,
an experimenter \E\
%with a budget $B$
@@ -17,8 +17,8 @@ As a proxy for various practical constraints, \E{} may select only a subset of s
We initiate the study of budgeted mechanisms for experimental design. In this setting, \E{} has a budget $B$.
Each subject $i$ declares an associated cost $c_i >0$ to be part of the experiment, and must be paid at least her cost. In particular, the {\em Experimental Design Problem} (\SEDP) is to find a set $S$ of subjects for the experiment that maximizes $V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i})$ under the constraint $\sum_{i\in S}c_i\leq B$; our objective function corresponds to the information gain in parameter $\beta$ that is learned through linear regression methods, and is related to the so-called $D$-optimality criterion. Further, the subjects are \emph{strategic} and may lie about their costs. Thus, we need to design a
mechanism for \SEDP{} with suitable properties.
-
-We present a deterministic, polynomial time, budget feasible mechanism scheme, that is approximately truthful and yields a constant ($\approx 12.98$) factor approximation to \EDP. %In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$.
+\end{comment}
+We present a deterministic, polynomial time, budget feasible mechanism scheme, that is approximately truthful and yields a constant ($\approx 12.98$) factor approximation for the \emph{Experimental Design Problem} (\EDP). %In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$.
By applying previous work on budget feasible mechanisms with a submodular objective, one could {\em only} have derived either an exponential time deterministic mechanism or a randomized polynomial time mechanism. We also establish that no truthful, budget-feasible mechanism is possible within a factor $2$ approximation, and show how to generalize our approach to a wide class of learning problems, beyond linear regression.
diff --git a/appendix.tex b/appendix.tex
index 5cb60f8..73e879d 100644
--- a/appendix.tex
+++ b/appendix.tex
@@ -264,7 +264,72 @@ Together, \eqref{eq:e1} and \eqref{eq:e2} imply the proposition.\qed
%For any $\varepsilon>0$, the barrier method computes an $\varepsilon$-accurate
%approximation of $L^*_c$ in time $O(poly(n,d,\log\log\varepsilon^{-1})$.
%\end{lemma}
-We proceed by showing that the optimal value of \eqref{eq:perturbed-primal} is close to the
+Our next technical result establishes that, using the barrier method, it is possible to construct an algorithm that computes $L^*_c$ at arbitrary accuracy in polynomial time \emph{and} is $\delta$-decreasing. We achieve this by restricting the optimization over a subset of $\dom_c$ at which the concave relaxation $L$ is ``sufficiently'' concave. Formally, for $\alpha\geq 0$ let $$\textstyle\dom_{c,\alpha} \defeq \{\lambda \in [\alpha,1]^n: \sum_{i\in \mathcal{N}}c_i\lambda_i \leq B\}\subseteq \dom_c . $$
+Note that $\dom_c=\dom_{c,0}.$ Consider the following perturbation of the concave relaxation \eqref{eq:primal}:
+\begin{align}\tag{$P_{c,\alpha}$}\label{eq:perturbed-primal}
+\begin{split} \text{Maximize:} &\qquad L(\lambda)\\
+\text{subject to:} & \qquad\lambda \in \dom_{c,\alpha}
+\end{split}
+\end{align}
+
+%Note, that the feasible set in Problem~\eqref{eq:primal} increases (for the
+%inclusion) when the cost decreases.
+%non-increasing.
+
+%Furthermore, \eqref{eq:primal} being a convex optimization problem, using
+%standard convex optimization algorithms (Lemma~\ref{lemma:barrier} gives
+%a formal statement for our specific problem) we can compute
+%a $\varepsilon$-accurate approximation of its optimal value as defined below.
+
+%\begin{definition}
+%$a$ is an $\varepsilon$-accurate approximation of $b$ iff $|a-b|\leq \varepsilon$.
+%\end{definition}
+
+%Note however that an $\varepsilon$-accurate approximation of a non-increasing
+%function is not in general non-increasing itself. The goal of this section is
+%to approximate $L^*_c$ while preserving monotonicity. The estimator we
+%construct has a weaker form of monotonicity that we call
+%\emph{$\delta$-monotonicity}.
+
+%\begin{definition}
+%Let $f$ be a function from $\reals^n$ to $\reals$, we say that $f$ is
+%\emph{$\delta$-increasing along the $i$-th coordinate} iff:
+%\begin{equation}\label{eq:dd}
+% \forall x\in\reals^n,\;
+% \forall \mu\geq\delta,\;
+% f(x+\mu e_i)\geq f(x)
+%\end{equation}
+%where $e_i$ is the $i$-th canonical basis vector of $\reals^n$. By extension,
+%$f$ is $\delta$-increasing iff it is $\delta$-increasing along all coordinates.
+
+%We define \emph{$\delta$-decreasing} functions by reversing the inequality in
+%\eqref{eq:dd}.
+%\end{definition}
+
+%We consider a perturbation of \eqref{eq:primal} by introducing:
+%\begin{equation}\tag{$P_{c, \alpha}$}\label{eq:perturbed-primal}
+% L^*_{c,\alpha} \defeq \max_{\lambda\in[\alpha, 1]^{n}}
+% \left\{L(\lambda) \Big| \sum_{i=1}^{n} \lambda_i c_i
+% \leq B\right\}
+%\end{equation}
+%Note that we have $L^*_c = L^*_c(0)$. We will also assume that
+%$\alpha<\frac{1}{nB}$ so that \eqref{eq:perturbed-primal} has at least one
+%feasible point: $(\frac{1}{nB},\ldots,\frac{1}{nB})$. By computing
+%an approximation of $L^*_{c,\alpha}$ as in Algorithm~\ref{alg:monotone}, we
+%obtain a $\delta$-decreasing approximation of $L^*_c$.
+
+\begin{algorithm}[t]
+ \caption{}\label{alg:monotone}
+ \begin{algorithmic}[1]
+ \Require{ $B\in \reals_+$, $c\in[0,B]^n$, $\delta\in (0,1]$, $\epsilon\in (0,1]$ }
+ \State $\alpha \gets \varepsilon (\delta/B+n^2)^{-1}$
+ \State Use the barrier method to solve \eqref{eq:perturbed-primal} with
+ accuracy $\varepsilon'=\frac{1}{2^{n+1}B}\alpha\delta b$; denote the output by $\hat{L}^*_{c,\alpha}$
+ \State \textbf{return} $\hat{L}^*_{c,\alpha}$
+ \end{algorithmic}
+\end{algorithm}
+
+Our construction of a $\delta$-decreasing, $\varepsilon$-accurate approximator of $L_c^*$ proceeds as follows: first, it computes an appropriately selected lower bound $\alpha$; using this bound, it solves the perturbed problem \eqref{eq:perturbed-primal} using the barrier method, also at an appropriately selected accuracy $\varepsilon'$, obtaining thus a $\varepsilon'$-accurate approximation of $L^*_{c,\alpha}\defeq \max_{\lambda\in \dom_{c,\alpha}} L(\lambda)$ . The corresponding output is returned as an approximation of $L^*_c$. A summary of the algorithm and the specific choices of $\alpha$ and $\varepsilon'$ are given in Algorithm~\ref{alg:monotone}. We proceed by showing that the optimal value of \eqref{eq:perturbed-primal} is close to the
optimal value of \eqref{eq:primal} (Lemma~\ref{lemma:proximity}) while being
well-behaved with respect to changes of the cost
(Lemma~\ref{lemma:monotonicity}). These lemmas together imply
diff --git a/approximation.tex b/approximation.tex
index 3e3f482..cc278e1 100644
--- a/approximation.tex
+++ b/approximation.tex
@@ -55,12 +55,10 @@ Contrary to problems such as \textsc{Knapsack}, the
multi-linear extension \eqref{eq:multi-linear} cannot be optimized in
polynomial time for the value function $V$ we study here, given by \eqref{modified}. Hence, we introduce an extension $L:[0,1]^n\to\reals$ s.t.~
\begin{equation}\label{eq:our-relaxation}
- \begin{split}
- L(\lambda) &\defeq
-\log\det\left(I_d + \sum_{i\in\mathcal{N}} \lambda_i x_i\T{x_i}\right)\\
-&= \log\det\left(\mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\bigg[I_d + \sum_{i\in S} x_i\T{x_i} \bigg]\right),
+ L(\lambda) \defeq
+\log\det\left(I_d + \sum_{i\in\mathcal{N}} \lambda_i x_i\T{x_i}\right),
+%= \log\det\left(\mathbb{E}_{S\sim P_\mathcal{N}^\lambda}\bigg[I_d + \sum_{i\in S} x_i\T{x_i} \bigg]\right),
\quad \lambda\in[0,1]^n.
-\end{split}
\end{equation}
Note that $L$ also extends $V$, and follows naturally from the multi-linear extension by swapping the
expectation and $\log \det$ in \eqref{eq:multi-linear}. Crucially, it is \emph{strictly concave} on $[0,1]^n$, a fact that we exploit in the next section to maximize $L$ subject to the budget constraint in polynomial time.
@@ -68,7 +66,7 @@ expectation and $\log \det$ in \eqref{eq:multi-linear}. Crucially, it is \emph{s
% L(\lambda) =
%\end{displaymath}
-Our first technical lemma relates the concave extension $L$ to the multi-linear extension $F$:
+Our first technical lemma relates $L$ to the multi-linear extension $F$:
\begin{lemma}\label{lemma:relaxation-ratio}
For all $\lambda\in[0,1]^{n},$
$ \frac{1}{2}
@@ -77,8 +75,8 @@ For all $\lambda\in[0,1]^{n},$
\end{lemma}
The proof of this lemma can be found in Appendix~\ref{proofofrelaxation-ratio}. In short, exploiting the concavity of the $\log\det$ function over the set of positive semi-definite matrices, we first bound the ratio of all partial derivatives of $F$ and $L$. We then show that the bound on the ratio of the derivatives also implies a bound on the ratio $F/L$.
-Armed with this result, we subsequently use pipage rounding to show that any $\lambda$ that maximizes the multi-linear extension $F$ can be rounded to an ``almost'' integral solution. More specifically, given a set of costs $c\in \reals^n_+$, we say that a $\lambda\in [0,1]^n$ is feasible if it belongs to the set
-\begin{align}\dom_c =\{\lambda \in [0,1]^n: \sum_{i\in \mathcal{N}} c_i\lambda_i\leq B\}.\label{fdom}\end{align} Then, the following lemma holds:
+Armed with this result, we subsequently use pipage rounding to show that any $\lambda$ that maximizes the multi-linear extension $F$ can be rounded to an ``almost'' integral solution. More specifically, given a set of costs $c\in \reals^n_+$, we say that a $\lambda\in [0,1]^n$ is feasible if it belongs to the set $\dom_c =\{\lambda \in [0,1]^n: \sum_{i\in \mathcal{N}} c_i\lambda_i\leq B\}$. Then, the following lemma holds:
+
\begin{lemma}[Rounding]\label{lemma:rounding}
For any feasible $\lambda\in \dom_c$, there exists a feasible
$\bar{\lambda}\in \dom_c$ such that (a) $F(\lambda)\leq F(\bar{\lambda})$, and (b) at most one of the
@@ -87,11 +85,9 @@ Armed with this result, we subsequently use pipage rounding to show that any $\
The proof of this lemma is in Appendix \ref{proofoflemmarounding}, and follows the main steps of the pipage rounding method of \citeN{pipage}. % this rounding property is referred to in the literature as \emph{cross-convexity} (see, \emph{e.g.}, \cite{dughmi}), or $\varepsilon$-convexity by \citeN{pipage}.
Together, Lemma~\ref{lemma:relaxation-ratio} and Lemma~\ref{lemma:rounding} imply that $OPT$, the optimal value of \EDP, can be approximated by solving the following convex optimization problem:
\begin{align}\tag{$P_c$}\label{eq:primal}
-\begin{split} \text{Maximize:} &\qquad L(\lambda)\\
-\text{subject to:} & \qquad\lambda \in \dom_c
-\end{split}
+\text{Maximize:} \quad L(\lambda)\quad \text{subject to:} \quad\lambda \in \dom_c
\end{align}
-In particular, for $L_c^*\defeq \max_{\lambda\in \dom_c} L(\lambda)$ the optimal value of \eqref{eq:primal}, the following holds:
+In particular, for $L_c^*\defeq \max_{\lambda\in \dom_c} L(\lambda)$, the following holds:
\begin{proposition}\label{prop:relaxation}
$OPT\leq L^*_c \leq 2 OPT + 2\max_{i\in\mathcal{N}}V(i)$.
\end{proposition}
@@ -127,79 +123,13 @@ Nevertheless, we prove that it is possible to use the barrier method to construc
for all $i\in \mathcal{N},x\in\reals^n, \mu\geq\delta,$
%\end{equation}
where $e_i$ is the $i$-th canonical basis vector of $\reals^n$.
-In other words, $f$ is $\delta$-decreasing if increasing any coordinate by $\delta$ or more at input $x$ ensures that the output will be at most $f(x)$.
-
-Our next technical result establishes that, using the barrier method, it is possible to construct an algorithm that computes $L^*_c$ at arbitrary accuracy in polynomial time \emph{and} is $\delta$-decreasing. We achieve this by restricting the optimization over a subset of $\dom_c$ at which the concave relaxation $L$ is ``sufficiently'' concave. Formally, for $\alpha\geq 0$ let $$\textstyle\dom_{c,\alpha} \defeq \{\lambda \in [\alpha,1]^n: \sum_{i\in \mathcal{N}}c_i\lambda_i \leq B\}\subseteq \dom_c . $$
-Note that $\dom_c=\dom_{c,0}.$ Consider the following perturbation of the concave relaxation \eqref{eq:primal}:
-\begin{align}\tag{$P_{c,\alpha}$}\label{eq:perturbed-primal}
-\begin{split} \text{Maximize:} &\qquad L(\lambda)\\
-\text{subject to:} & \qquad\lambda \in \dom_{c,\alpha}
-\end{split}
-\end{align}
-
-%Note, that the feasible set in Problem~\eqref{eq:primal} increases (for the
-%inclusion) when the cost decreases.
-%non-increasing.
-
-%Furthermore, \eqref{eq:primal} being a convex optimization problem, using
-%standard convex optimization algorithms (Lemma~\ref{lemma:barrier} gives
-%a formal statement for our specific problem) we can compute
-%a $\varepsilon$-accurate approximation of its optimal value as defined below.
-
-%\begin{definition}
-%$a$ is an $\varepsilon$-accurate approximation of $b$ iff $|a-b|\leq \varepsilon$.
-%\end{definition}
-
-%Note however that an $\varepsilon$-accurate approximation of a non-increasing
-%function is not in general non-increasing itself. The goal of this section is
-%to approximate $L^*_c$ while preserving monotonicity. The estimator we
-%construct has a weaker form of monotonicity that we call
-%\emph{$\delta$-monotonicity}.
-
-%\begin{definition}
-%Let $f$ be a function from $\reals^n$ to $\reals$, we say that $f$ is
-%\emph{$\delta$-increasing along the $i$-th coordinate} iff:
-%\begin{equation}\label{eq:dd}
-% \forall x\in\reals^n,\;
-% \forall \mu\geq\delta,\;
-% f(x+\mu e_i)\geq f(x)
-%\end{equation}
-%where $e_i$ is the $i$-th canonical basis vector of $\reals^n$. By extension,
-%$f$ is $\delta$-increasing iff it is $\delta$-increasing along all coordinates.
-
-%We define \emph{$\delta$-decreasing} functions by reversing the inequality in
-%\eqref{eq:dd}.
-%\end{definition}
-
-%We consider a perturbation of \eqref{eq:primal} by introducing:
-%\begin{equation}\tag{$P_{c, \alpha}$}\label{eq:perturbed-primal}
-% L^*_{c,\alpha} \defeq \max_{\lambda\in[\alpha, 1]^{n}}
-% \left\{L(\lambda) \Big| \sum_{i=1}^{n} \lambda_i c_i
-% \leq B\right\}
-%\end{equation}
-%Note that we have $L^*_c = L^*_c(0)$. We will also assume that
-%$\alpha<\frac{1}{nB}$ so that \eqref{eq:perturbed-primal} has at least one
-%feasible point: $(\frac{1}{nB},\ldots,\frac{1}{nB})$. By computing
-%an approximation of $L^*_{c,\alpha}$ as in Algorithm~\ref{alg:monotone}, we
-%obtain a $\delta$-decreasing approximation of $L^*_c$.
-
-\begin{algorithm}[t]
- \caption{}\label{alg:monotone}
- \begin{algorithmic}[1]
- \Require{ $B\in \reals_+$, $c\in[0,B]^n$, $\delta\in (0,1]$, $\epsilon\in (0,1]$ }
- \State $\alpha \gets \varepsilon (\delta/B+n^2)^{-1}$
- \State Use the barrier method to solve \eqref{eq:perturbed-primal} with
- accuracy $\varepsilon'=\frac{1}{2^{n+1}B}\alpha\delta b$; denote the output by $\hat{L}^*_{c,\alpha}$
- \State \textbf{return} $\hat{L}^*_{c,\alpha}$
- \end{algorithmic}
-\end{algorithm}
+In other words, $f$ is $\delta$-decreasing if increasing any coordinate by $\delta$ or more at input $x$ ensures that the output will be at most $f(x)$. In particular, the following proposition holds:
-Our construction of a $\delta$-decreasing, $\varepsilon$-accurate approximator of $L_c^*$ proceeds as follows: first, it computes an appropriately selected lower bound $\alpha$; using this bound, it solves the perturbed problem \eqref{eq:perturbed-primal} using the barrier method, also at an appropriately selected accuracy $\varepsilon'$, obtaining thus a $\varepsilon'$-accurate approximation of $L^*_{c,\alpha}\defeq \max_{\lambda\in \dom_{c,\alpha}} L(\lambda)$ . The corresponding output is returned as an approximation of $L^*_c$. A summary of the algorithm and the specific choices of $\alpha$ and $\varepsilon'$ are given in Algorithm~\ref{alg:monotone}. The following proposition, which is proved in Appendix~\ref{proofofpropmonotonicity}, establishes that this algorithm has both properties we desire:
\begin{proposition}\label{prop:monotonicity}
For any $\delta\in(0,1]$ and any $\varepsilon\in(0,1]$,
Algorithm~\ref{alg:monotone} computes a $\delta$-decreasing,
$\varepsilon$-accurate approximation of $L^*_c$. The running time of the
algorithm is $O\big(poly(n, d, \log\log\frac{B}{b\varepsilon\delta})\big)$.
\end{proposition}
-We note that the execution of the barrier method on the restricted set $\dom_{c,\alpha}$ is necessary. The algorithm's output when executed over the entire domain may not necessarily be $\delta$-decreasing, even when the approximation accuracy is small. This is because costs become saturated when the optimal $\lambda\in \dom_c$ lies at the boundary: increasing them has no effect on the objective. Forcing the optimization to happen ``off'' the boundary ensures that this does not occur, while taking $\alpha$ to be small ensures that this perturbation does not cost much in terms of approximation accuracy.
+The description of Algorithm~\ref{alg:monotone} as well as the proof of the proposition can be found in Appendix~\ref{proofofproprelaxation}.
diff --git a/general.tex b/general.tex
index 2e7312e..d7b033d 100644
--- a/general.tex
+++ b/general.tex
@@ -6,7 +6,7 @@
%as our objective. Moreover,
In the general case where the prior distribution of the experimenter on the
-model $\beta$ in \eqref{model} is not homotropic and has a generic covariance
+model $\beta$ is not homotropic and has a generic covariance
matrix $R$, the value function takes the general form given by
\eqref{dcrit}.
diff --git a/intro.tex b/intro.tex
index 2519ac4..af4a93c 100644
--- a/intro.tex
+++ b/intro.tex
@@ -5,7 +5,7 @@ known to the experimenter.
\E\ wishes to measure a certain inherent property of the subjects by performing an experiment: the outcome $y_i$ of the experiment on a subject $i$ is unknown to \E\ before the experiment is performed.
Typically, \E\ has a hypothesis on the relationship between $x_i$'s and $y_i$'s. Due to its simplicity, as well as its ubiquity in statistical analysis, a large body of work has focused on linear hypotheses: \emph{i.e.}, it is assumed that there exists a $\beta\in\reals^d$ such that
-$$y_i = \T{\beta} x_i+\varepsilon_i,$$ for all $i\in \{1,\ldots,n\},$ where $\varepsilon_i$ are zero-mean, i.i.d.~random variables. Conducting the experiments and obtaining the measurements $y_i$ lets \E\ estimate $\beta$, \emph{e.g.}, through linear regression. %, \emph{i.e.}, the model underlying the data, and the experimenter's goal is to obtain such an estimate as accurately as possible. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate.
+$y_i = \T{\beta} x_i+\varepsilon_i,$ for all $i\in \{1,\ldots,n\},$ where $\varepsilon_i$ are zero-mean, i.i.d.~random variables. Conducting the experiments and obtaining the measurements $y_i$ lets \E\ estimate $\beta$, \emph{e.g.}, through linear regression. %, \emph{i.e.}, the model underlying the data, and the experimenter's goal is to obtain such an estimate as accurately as possible. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate.
The above experimental design scenario has many applications. Regression over personal data collected through surveys or experimentation is the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. Crucially, statistical analysis of user data is also a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations.
Beyond linear regression, there is a rich literature about estimation procedures, as well as about means of quantifying the quality of the produced estimate~\cite{pukelsheim2006optimal}. There is also an extensive theory on how to select subjects
@@ -24,9 +24,8 @@ However, we are not aware of a principled study of this setting from a strategic
% When subjects are strategic, they may have an incentive to misreport their cost, leading to the need for a sophisticated choice of experiments and payments. Arguably, user incentiviation is of particular pertinence due to the extent of statistical analysis over user data on the Internet. %, which has led to the rise of several different research efforts in studying data markets \cite{...}.
Our contributions are as follows.
-\begin{itemize}
-\item
-We initiate the study of experimental design in the presence of a budget and strategic subjects.
+
+1. We initiate the study of experimental design in the presence of a budget and strategic subjects.
%formulate the problem of experimental design subject to a given budget, in the presence of strategic agents who may lie about their costs. %In particular, we focus on linear regression. This is naturally viewed as a budget feasible mechanism design problem, in which the objective function %is sophisticated and
%is related to the covariance of the $x_i$'s.
In particular, we formulate the {\em Experimental Design Problem} (\SEDP) as
@@ -35,16 +34,12 @@ We initiate the study of experimental design in the presence of a budget and str
subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. When subjects are strategic, the above problem can be naturally approached as a \emph{budget feasible mechanism design} problem, as introduced by \citeN{singer-mechanisms}.
%, and other {\em strategic constraints} we don't list here.
-\smallskip
The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through ridge regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}.
-\item
-We present a polynomial time mechanism scheme for \SEDP{} that is approximately truthful and yields a constant factor ($\approx 12.98$) approximation to the optimal value of \eqref{obj}. %In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$.
+
+2. We present a polynomial time mechanism scheme for \SEDP{} that is approximately truthful and yields a constant factor ($\approx 12.98$) approximation to the optimal value of \eqref{obj}. %In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$.
In contrast to this, we show that no truthful, budget-feasible mechanisms are possible for \SEDP{} within a factor 2 approximation.
-\smallskip
We note that the objective \eqref{obj} is submodular. Using this fact, applying previous results on budget feasible mechanism design under general submodular objectives~\cite{singer-mechanisms,chen} would yield either a deterministic, truthful, constant-approximation mechanism that requires exponential time, or a non-determi\-nis\-tic, (universally) truthful, poly-time mechanism that yields a constant approximation ratio only \emph{in expectation} (\emph{i.e.}, its approximation guarantee for a given instance may in fact be unbounded).
-\end{itemize}
-
% budget feasible mechanisms for submodular maximization yields a $8.34$-approximate deterministic mechanism for \SEDP{} that is not polynomial time, unless P=NP. Alternatively, previous work by \citeN{chen} on general submodular objectives also yields a randomized, 7.91-approximate polynomial time mechanism for \SEDP{} that is however \emph{universally truthful}, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms. In contrast, our result is the first deterministic constant factor approximation mechanism for \SEDP{} that is both polytime and truthful.
% either a randomized, 7.91-approximate polynomial time mechanism for maximizing a general submodular function that is universally truthful, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms.
diff --git a/paper.tex b/paper.tex
index 7abcf9a..de0fbc2 100644
--- a/paper.tex
+++ b/paper.tex
@@ -7,7 +7,7 @@
\usepackage{bbm,color,verbatim}
\input{definitions}
\usepackage[pagebackref=true,breaklinks=true,colorlinks=true,pagebackref=false]{hyperref}
-\title{Budget Feasible Mechanisms for Experimental Design}
+\title{Budget Feasible Mechanisms \\for Experimental Design}
\author{
Thibaut Horel\inst{1}
\and
diff --git a/problem.tex b/problem.tex
index 73647da..798e69a 100644
--- a/problem.tex
+++ b/problem.tex
@@ -6,30 +6,40 @@ The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimu
Suppose that an experimenter \E\ wishes to conduct $k$ among $n$ possible
experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is
associated with a set of parameters (or features) $x_i\in \reals^d$, normalized
-so that $b\leq \|x_i\|^2_2\leq 1,$ for some $b>0$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.},
-\begin{align}\label{model}
- \forall i\in\mathcal{N},\quad y_i = \T{\beta} x_i + \varepsilon_i
-\end{align}
-where $\beta$ is a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random variables with mean 0 and variance $\sigma^2$.
-
-For example, each $i$ may correspond to a human subject; the
-feature vector $x_i$ may correspond to a normalized vector of her age, weight,
-gender, income, \emph{etc.}, and the measurement $y_i$ may capture some
-biometric information (\emph{e.g.}, her red cell blood count, a genetic marker,
-etc.). The magnitude of the coefficient $\beta_i$ captures the effect that feature $i$ has on the measured variable, and its sign captures whether the correlation is positive or negative.
+so that $b\leq \|x_i\|^2_2\leq 1,$ for some $b>0$. Denote by $S\subseteq
+\mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its
+execution, experiment $i\in S$ reveals an output variable (the ``measurement'')
+$y_i$, related to the experiment features $x_i$ through a linear function,
+\emph{i.e.}, $y_i = \T{\beta} x_i + \varepsilon_i$ where $\beta$ is a vector in
+$\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the
+\emph{measurement noise}) are independent, normally distributed random
+variables with mean 0 and variance $\sigma^2$.
+For example, each $i$ may correspond to a human subject; the feature vector
+$x_i$ may correspond to a normalized vector of her age, weight, gender, income,
+\emph{etc.}, and the measurement $y_i$ may capture some biometric information
+(\emph{e.g.}, her red cell blood count, a genetic marker, etc.). The magnitude
+of the coefficient $\beta_i$ captures the effect that feature $i$ has on the
+measured variable, and its sign captures whether the correlation is positive or
+negative.
-The purpose of these experiments is to allow \E\ to estimate the model $\beta$. In particular,
- assume that the experimenter \E\ has a {\em prior}
+The purpose of these experiments is to allow \E\ to estimate the model
+$\beta$. In particular, assume that the experimenter \E\ has a {\em prior}
distribution on $\beta$, \emph{i.e.}, $\beta$ has a multivariate normal prior
-with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance).
-Then, \E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}:
+with zero mean and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where
+$\sigma^2$ is the noise variance). Then, \E\ estimates $\beta$ through
+\emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter
+which maximizes the posterior distribution of $\beta$ given the observations
+$y_S$. Under the linearity assumption and the Gaussian prior on $\beta$,
+maximum a posteriori estimation leads to the following maximization
+\cite{hastie}:
\begin{align}
\begin{split}
- \hat{\beta} = \argmax_{\beta\in\reals^d} \prob(\beta\mid y_S) &=\argmin_{\beta\in\reals^d} \big(\sum_{i\in S} (y_i - \T{\beta}x_i)^2
- + \T{\beta}R\beta\big)\\
- & = (R+\T{X_S}X_S)^{-1}X_S^Ty_S \label{ridge}
-\end{split}
+ \hat{\beta} = \argmax_{\beta\in\reals^d} \prob(\beta\mid y_S)
+ &=\argmin_{\beta\in\reals^d} \big(\sum_{i\in S} (y_i - \T{\beta}x_i)^2
+ + \T{\beta}R\beta\big)\\
+ & = (R+\T{X_S}X_S)^{-1}X_S^Ty_S \label{ridge}
+ \end{split}
\end{align}
where the last equality is obtained by setting $\nabla_{\beta}\prob(\beta\mid y_S)$ to zero and solving the resulting linear system; in \eqref{ridge}, $X_S\defeq[x_i]_{i\in S}\in \reals^{|S|\times d}$ is the matrix of experiment features and
$y_S\defeq[y_i]_{i\in S}\in\reals^{|S|}$ are the observed measurements.
@@ -44,15 +54,13 @@ This optimization, commonly known as \emph{ridge regression}, includes an additi
Let $V:2^\mathcal{N}\to\reals$ be a \emph{value function}, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The classical experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$.
A variety of different value functions are used in literature~\cite{pukelsheim2006optimal,boyd2004convex}; %; almost all make use of the covariance $(R+\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$.
-one that has natural advantages is the \emph{information gain}: %\emph{$D$-optimality criterion}: %which yields the following optimization problem
-\begin{align}
- V(S)= I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S). \label{informationgain}
-\end{align}
+one that has natural advantages is the \emph{information gain}, %\emph{$D$-optimality criterion}: %which yields the following optimization problem
+ $V(S)= I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S)$
which is the entropy reduction on $\beta$ after the revelation of $y_S$ (also known as the mutual information between $y_S$ and $\beta$).
Hence, selecting a set of experiments $S$ that
maximizes $V(S)$ is equivalent to finding the set of experiments that minimizes
the uncertainty on $\beta$, as captured by the entropy reduction of its estimator.
-Under the linear model \eqref{model}, and the Gaussian prior, the information gain takes the following form (see, \emph{e.g.}, \cite{chaloner1995bayesian}):
+Under the linear model, and the Gaussian prior, the information gain takes the following form (see, \emph{e.g.}, \cite{chaloner1995bayesian}):
\begin{align}
I(\beta;y_S)&= \frac{1}{2}\log\det(R+ \T{X_S}X_S) - \frac{1}{2}\log\det R\label{dcrit} %\\
\end{align}
@@ -142,8 +150,7 @@ yields an approximation ratio of $\frac{5 e }{e-1}~$\cite{singer-mechanisms}; t
\subsection{Budget-Feasible Experimental Design: Strategic Case}
We study the following \emph{strategic} setting, in which the costs $c_i$ are {\em not} common knowledge and their reporting can be manipulated by the experiment subjects. The latter are strategic and wish to maximize their utility, which is the difference of the payment they receive and their true cost. We note that, though subjects may misreport $c_i$, they cannot lie about $x_i$ (\emph{i.e.}, all public features are verifiable prior to the experiment) nor $y_i$ (\emph{i.e.}, the subject cannot falsify her measurement).
-In this setting, experimental design reduces to a \emph{budget feasible reverse auction}, as introduced by \citeN{singer-mechanisms}; we review the formal definition in Appendix~\ref{app:budgetfeasible}. In short, given a budget $B$ and a value function $V:2^{\mathcal{N}}\to\reals_+$, a \emph{reverse auction mechanism} $\mathcal{M} = (S,p)$ comprises (a) an \emph{allocation function}\footnote{Note that $S$ would be more aptly termed as a \emph{selection} function, as this is a reverse auction, but we retain the term ``allocation'' to align with the familiar term from standard auctions.}
-$S:\reals_+^n \to 2^\mathcal{N}$, determining the set
+In this setting, experimental design reduces to a \emph{budget feasible reverse auction}, as introduced by \citeN{singer-mechanisms}; we review the formal definition in Appendix~\ref{app:budgetfeasible}. In short, given a budget $B$ and a value function $V:2^{\mathcal{N}}\to\reals_+$, a \emph{reverse auction mechanism} $\mathcal{M} = (S,p)$ comprises (a) an \emph{allocation function} $S:\reals_+^n \to 2^\mathcal{N}$, determining the set
of experiments to be purchased, and (b) a \emph{payment function}
$p:\reals_+^n\to \reals_+^n$, determining the payments $[p_i(c)]_{i\in \mathcal{N}}$ received by experiment subjects.