From d0e9c3f41bd11a0bcb32fa13ecbcbb9ec9ea0041 Mon Sep 17 00:00:00 2001 From: Thibaut Horel Date: Sun, 22 Sep 2013 15:46:14 -0400 Subject: Fist reduction of the main part --- intro.tex | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) (limited to 'intro.tex') diff --git a/intro.tex b/intro.tex index 2519ac4..af4a93c 100644 --- a/intro.tex +++ b/intro.tex @@ -5,7 +5,7 @@ known to the experimenter. \E\ wishes to measure a certain inherent property of the subjects by performing an experiment: the outcome $y_i$ of the experiment on a subject $i$ is unknown to \E\ before the experiment is performed. Typically, \E\ has a hypothesis on the relationship between $x_i$'s and $y_i$'s. Due to its simplicity, as well as its ubiquity in statistical analysis, a large body of work has focused on linear hypotheses: \emph{i.e.}, it is assumed that there exists a $\beta\in\reals^d$ such that -$$y_i = \T{\beta} x_i+\varepsilon_i,$$ for all $i\in \{1,\ldots,n\},$ where $\varepsilon_i$ are zero-mean, i.i.d.~random variables. Conducting the experiments and obtaining the measurements $y_i$ lets \E\ estimate $\beta$, \emph{e.g.}, through linear regression. %, \emph{i.e.}, the model underlying the data, and the experimenter's goal is to obtain such an estimate as accurately as possible. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate. +$y_i = \T{\beta} x_i+\varepsilon_i,$ for all $i\in \{1,\ldots,n\},$ where $\varepsilon_i$ are zero-mean, i.i.d.~random variables. Conducting the experiments and obtaining the measurements $y_i$ lets \E\ estimate $\beta$, \emph{e.g.}, through linear regression. %, \emph{i.e.}, the model underlying the data, and the experimenter's goal is to obtain such an estimate as accurately as possible. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate. The above experimental design scenario has many applications. Regression over personal data collected through surveys or experimentation is the cornerstone of marketing research, as well as research in a variety of experimental sciences such as medicine and sociology. Crucially, statistical analysis of user data is also a widely spread practice among Internet companies, which routinely use machine learning techniques over vast records of user data to perform inference and classification tasks integral to their daily operations. Beyond linear regression, there is a rich literature about estimation procedures, as well as about means of quantifying the quality of the produced estimate~\cite{pukelsheim2006optimal}. There is also an extensive theory on how to select subjects @@ -24,9 +24,8 @@ However, we are not aware of a principled study of this setting from a strategic % When subjects are strategic, they may have an incentive to misreport their cost, leading to the need for a sophisticated choice of experiments and payments. Arguably, user incentiviation is of particular pertinence due to the extent of statistical analysis over user data on the Internet. %, which has led to the rise of several different research efforts in studying data markets \cite{...}. Our contributions are as follows. -\begin{itemize} -\item -We initiate the study of experimental design in the presence of a budget and strategic subjects. + +1. We initiate the study of experimental design in the presence of a budget and strategic subjects. %formulate the problem of experimental design subject to a given budget, in the presence of strategic agents who may lie about their costs. %In particular, we focus on linear regression. This is naturally viewed as a budget feasible mechanism design problem, in which the objective function %is sophisticated and %is related to the covariance of the $x_i$'s. In particular, we formulate the {\em Experimental Design Problem} (\SEDP) as @@ -35,16 +34,12 @@ We initiate the study of experimental design in the presence of a budget and str subject to a budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. When subjects are strategic, the above problem can be naturally approached as a \emph{budget feasible mechanism design} problem, as introduced by \citeN{singer-mechanisms}. %, and other {\em strategic constraints} we don't list here. -\smallskip The objective function, which is the key, is formally obtained by optimizing the information gain in $\beta$ when the latter is learned through ridge regression, and is related to the so-called $D$-optimality criterion~\cite{pukelsheim2006optimal,atkinson2007optimum}. -\item -We present a polynomial time mechanism scheme for \SEDP{} that is approximately truthful and yields a constant factor ($\approx 12.98$) approximation to the optimal value of \eqref{obj}. %In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$. + +2. We present a polynomial time mechanism scheme for \SEDP{} that is approximately truthful and yields a constant factor ($\approx 12.98$) approximation to the optimal value of \eqref{obj}. %In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and $\log\log\frac{B}{\epsilon\delta}$. In contrast to this, we show that no truthful, budget-feasible mechanisms are possible for \SEDP{} within a factor 2 approximation. -\smallskip We note that the objective \eqref{obj} is submodular. Using this fact, applying previous results on budget feasible mechanism design under general submodular objectives~\cite{singer-mechanisms,chen} would yield either a deterministic, truthful, constant-approximation mechanism that requires exponential time, or a non-determi\-nis\-tic, (universally) truthful, poly-time mechanism that yields a constant approximation ratio only \emph{in expectation} (\emph{i.e.}, its approximation guarantee for a given instance may in fact be unbounded). -\end{itemize} - % budget feasible mechanisms for submodular maximization yields a $8.34$-approximate deterministic mechanism for \SEDP{} that is not polynomial time, unless P=NP. Alternatively, previous work by \citeN{chen} on general submodular objectives also yields a randomized, 7.91-approximate polynomial time mechanism for \SEDP{} that is however \emph{universally truthful}, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms. In contrast, our result is the first deterministic constant factor approximation mechanism for \SEDP{} that is both polytime and truthful. % either a randomized, 7.91-approximate polynomial time mechanism for maximizing a general submodular function that is universally truthful, \emph{i.e.}, it is sampled from a distribution among truthful mechanisms. -- cgit v1.2.3-70-g09d2