summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--abstract.tex6
-rw-r--r--definitions.tex2
-rw-r--r--intro.tex11
-rw-r--r--main.tex4
-rw-r--r--problem.tex24
-rw-r--r--proofs.tex2
6 files changed, 23 insertions, 26 deletions
diff --git a/abstract.tex b/abstract.tex
index a92ad45..9e00a95 100644
--- a/abstract.tex
+++ b/abstract.tex
@@ -10,15 +10,15 @@ hypothetical relationship between $x_i$'s and $y_i$'s, \emph{e.g.}, $y_i \appr
$\beta$ from experiments.
%conducting the experiments and obtaining the measurements $y_i$ allows
%\E\ can estimate $\beta$.
-As a proxy for various practical constraints, \E{} may select subjects to select for the experiments.
+As a proxy for various practical constraints, \E{} may select only a subset of subjects on which to conduct the experiment.
%\E 's goal is to select which experiments to conduct, subject to her budget constraint.
%, to obtain the best estimate possible for $\beta$.
We initiate the study of budgeted mechanisms for experimental design. In this setting, \E{} has a budget $B$.
Each subject $i$ declares associated cost $c_i >0$ to be part of the experiment, and must be paid at least their cost. Further, the subjects
-are \emph{strategic} and may lie about their costs . In particular, we formulate the {\em Strategic Experimental Design Problem} (\SEDP) as finding a set $S$ of subjects for the experiment that maximizes $V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i})$ under the constraint $\sum_{i\in S}c_i\leq B$; our objective function corresponds to the information gain in parameter $\beta$ that is learned through linear regression methods, and is related to the so-called $D$-optimality criterion.
+are \emph{strategic} and may lie about their costs. In particular, we formulate the {\em Experimental Design Problem} (\SEDP) as finding a set $S$ of subjects for the experiment that maximizes $V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i})$ under the constraint $\sum_{i\in S}c_i\leq B$; our objective function corresponds to the information gain in parameter $\beta$ that is learned through linear regression methods, and is related to the so-called $D$-optimality criterion.
We present a deterministic, polynomial time, truthful, budget feasible mechanism for \SEDP{}.
-By applying previous work on budget feasible mechanisms with submodular objective, one could have derived either an exponential time deterministic mechanism or a randomized polynomial time mechanism. Our mechanism yields a constant factor ($\approx 12.68$) approximation, and we show that no truthful, budget-feasible algorithms are possible within a factor $2$ approximation. We also show how to apply our approach to a wide class of learning problems.
+By applying previous work on budget feasible mechanisms with submodular objective, one could have derived either an exponential time deterministic mechanism or a randomized polynomial time mechanism. Our mechanism yields a constant factor ($\approx 12.68$) approximation, and we show that no truthful, budget-feasible algorithms are possible within a factor $2$ approximation. We also show how to genralize our approach to a wide class of learning problems.
diff --git a/definitions.tex b/definitions.tex
index 22a1f46..49dd5bc 100644
--- a/definitions.tex
+++ b/definitions.tex
@@ -24,7 +24,7 @@
\newcommand{\thibaut}[1]{\textcolor{blue}{Thibaut: #1}}
\newcommand{\T}[1]{#1^T}
\newcommand{\EDP}{EDP}
-\newcommand{\SEDP}{SEDP}
+\newcommand{\SEDP}{EDP}
\newcommand{\E}{{\tt E}}
\newcommand{\id}{\mathbbm{1}}
\newcommand{\junk}[1]{}
diff --git a/intro.tex b/intro.tex
index 39e6b52..7b3114f 100644
--- a/intro.tex
+++ b/intro.tex
@@ -11,9 +11,9 @@ known to the experimenter.
Typically, \E\ has a hypothesis on the relationship between $x_i$'s and $y_i$'s. Due to its simplicity, as well as its ubiquity in statistical analysis, a large body of work has focused on linear hypotheses: \emph{i.e.}, it is assumed that there exists a $\beta\in\reals^d$ such that
$$y_i = \T{\beta} x_i+\varepsilon_i,$$ for all $i\in \{1,\ldots,n\},$ where $\varepsilon_i$ are zero-mean, i.i.d.~random variables. Conducting the experiments and obtaining the measurements $y_i$ lets \E\ estimate $\beta$, \emph{e.g.}, through linear regression. %, \emph{i.e.}, the model underlying the data, and the experimenter's goal is to obtain such an estimate as accurately as possible. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate.
-The above experimental design scenario has many applications. Regression over personal data collected through surveys or experimentation is the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. Crucially, the statistical analysis of user data is also a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations.
+The above experimental design scenario has many applications. Regression over personal data collected through surveys or experimentation is the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. Crucially, statistical analysis of user data is also a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations.
Beyond linear regression, there is a rich literature about estimation procedures, as well as for means of quantifying the quality of the produced estimate~\cite{pukelsheim2006optimal}. There is also an extensive theory on how to select subjects
-if \E\ can conduct only a limited number of experiments, so the estimation process returns $\beta$
+if \E\ can conduct only a limited number of experiments, so the estimation process returns a $\beta$
that approximates the true parameter of the underlying population \cite{ginebra2007measure,le1996comparison,chaloner1995bayesian,boyd2004convex}.
We depart from this classical setup by viewing experimental design in a strategic setting, and by studying budgeted mechanism design issues.
@@ -32,14 +32,13 @@ Our contributions are as follows.
We initiate the study of experimental design problem in presence of budgets and strategic subjects.
%formulate the problem of experimental design subject to a given budget, in the presence of strategic agents who may lie about their costs. %In particular, we focus on linear regression. This is naturally viewed as a budget feasible mechanism design problem, in which the objective function %is sophisticated and
%is related to the covariance of the $x_i$'s.
- In particular, we formulate the {\em Strategic Experimental Design Problem} (\SEDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize
+ In particular, we formulate the {\em Experimental Design Problem} (\SEDP) as follows: the experimenter \E\ wishes to find set $S$ of subjects to maximize
\begin{align}V(S) = \log\det\Big(I_d+\sum_{i\in S}x_i\T{x_i}\Big) \label{obj}\end{align}
-subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget, as well as the usual constraints of truthfulness and individual rationality.
-This is naturally viewed as a \emph{budget feasible mechanism design} problem, as introduced by \citeN{singer-mechanisms}.
+subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. When subjects are strategic, the above problem can be naturally approached as a \emph{budget feasible mechanism design} problem, as introduced by \citeN{singer-mechanisms}.
%, and other {\em strategic constraints} we don't list here.
\smallskip
-The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through linear regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}.
+The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through ridge regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}.
\item
We present a polynomial time, truthful mechanism for \SEDP{}, yielding a constant factor ($\approx 12.98$) approximation to the optimal value of \eqref{obj}. In contrast to this, we show that no truthful, budget-feasible mechanisms are possible for \SEDP{} within a factor 2 approximation.
diff --git a/main.tex b/main.tex
index fa8558e..469995d 100644
--- a/main.tex
+++ b/main.tex
@@ -23,7 +23,7 @@ the following algorithm:
\textbf{if}\; V(\{i^*\}) \geq V(S_G)\; \textbf{return}\; \{i^*\}
\;\textbf{else return}\; S_G
\end{equation}
-has an approximation ratio of $\frac{5e}{e-1}$ \cite{chen}.
+has a constant approximation ratio \cite{singer-mechanisms}.
\subsection{Submodular Maximization in the Strategic Case}
@@ -146,8 +146,6 @@ We can now state our main result:
\end{theorem}
The value of the constant $C$ is given by \eqref{eq:constant} in
Section~\ref{sec:proofofmainthm}.
-
-\fussy
In addition, we prove the following simple lower bound.
\begin{theorem}\label{thm:lowerbound}
There is no $2$-approximate, truthful, budget feasible, individually rational mechanism for EDP.
diff --git a/problem.tex b/problem.tex
index 3bdbb94..347bce7 100644
--- a/problem.tex
+++ b/problem.tex
@@ -138,13 +138,13 @@ We study the strategic case, in wich the costs $c_i$ are {\em not} common knowle
-When the experiment subjects are strategic, the experimental design problem becomes a special case of a \emph{budget feasible reverse auction}, as introduced by \citeN{singer-mechanisms}. Formally, given a budget $B$ and a value function $V:2^{\mathcal{N}}\to\reals_+$, a \emph{mechanism} $\mathcal{M} = (S,p)$ comprises (a) an \emph{allocation function}
+When the experiment subjects are strategic, the experimental design problem becomes a special case of a \emph{budget feasible reverse auction}, as introduced by \citeN{singer-mechanisms}. Formally, given a budget $B$ and a value function $V:2^{\mathcal{N}}\to\reals_+$, a \emph{mechanism} $\mathcal{M} = (S,p)$ comprises (a) an \emph{allocation function}\footnote{Note that $S$ would be more aptly termed as a \emph{selection} function, as this is a reverse auction, but we retain the term ``allocation'' to align with the familiar term from standard auctions.}
$S:\reals_+^n \to 2^\mathcal{N}$ and (b) a \emph{payment function}
$p:\reals_+^n\to \reals_+^n$. Given the
vector of costs $c=[c_i]_{i\in\mathcal{N}}$, the allocation function $S$ determines the set in
$ \mathcal{N}$ of experiments to be purchased, while the payment function
returns a vector of payments $[p_i(c)]_{i\in \mathcal{N}}$.
- Let $s_i(c) = \id_{i\in S(c)}$ be the binary indicator of $i\in S(c)$. Mechanism design in budget feasible reverse auctions seeks mechanisms that have the following properties \cite{singer-mechanisms,chen}:
+ Let $s_i(c) = \id_{i\in S(c)}$ be the binary indicator of $i\in S(c)$. As usual, we seek mechanisms that have the following properties \cite{singer-mechanisms}:
\begin{itemize}
\item \emph{Normalization.} Unallocated experiments receive zero payments, \emph{i.e.},
\begin{align}s_i(c)=0\text{ implies }p_i(c)=0.\label{normal}\end{align}
@@ -162,16 +162,16 @@ returns a vector of payments $[p_i(c)]_{i\in \mathcal{N}}$.
\begin{align} \sum_{i\in\mathcal{N}} p_i(c) \leq B.\label{budget}\end{align}
%\end{displaymath}
\end{itemize}
-We define the \emph{Strategic} version of \EDP{} as
-\begin{center}
-\textsc{StrategicExperimentalDesignProblem} (SEDP)
-%\begin{subequations}
-\begin{align*}
-\text{Maximize:}&\quad V(S) = \log\det(I_d+\T{X_{S}}X_{S}) \\ %\label{modified} \\
-\text{subject to:}&\quad \mathcal{M}=(S,p)\text{ satisfies }\eqref{normal}-\eqref{budget}
-\end{align*}
-%\end{subequations}
-\end{center}
+%We define the \emph{Strategic} version of \EDP{} as
+%\begin{center}
+%\textsc{StrategicExperimentalDesignProblem} (SEDP)
+%%\begin{subequations}
+%\begin{align*}
+%\text{Maximize:}&\quad V(S) = \log\det(I_d+\T{X_{S}}X_{S}) \\ %\label{modified} \\
+%\text{subject to:}&\quad \mathcal{M}=(S,p)\text{ satisfies }\eqref{normal}-\eqref{budget}
+%\end{align*}
+%%\end{subequations}
+%\end{center}
Given that the full information problem \EDP{} is NP-hard, we further seek mechanisms that have the following two additional properties:
\begin{itemize}
\item \emph{Approximation Ratio.} The value of the allocated set should not
diff --git a/proofs.tex b/proofs.tex
index 0b35805..0c06d67 100644
--- a/proofs.tex
+++ b/proofs.tex
@@ -54,7 +54,7 @@ OPT \leq \frac{e}{e-1}\big( 3 V(S_G) + 2 V(i^*)\big).
\end{displaymath}
\end{lemma}
-Using Lemma~\ref{lemma:relaxation} we can complete the proof of
+Using Lemmas~\ref{lemma:relaxation} and~\ref{lemma:greedy-bound} we can complete the proof of
Theorem~\ref{thm:main} by showing that, for any $\varepsilon > 0$, if
$OPT_{-i}'$, the optimal value of $L$ when $i^*$ is excluded from
$\mathcal{N}$, has been computed to a precision $\varepsilon$, then the set