summaryrefslogtreecommitdiffstats
path: root/abstract.tex
blob: bf078d04e11ad7216bfbf0646dc196a5f259642a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
%We initiate the study of mechanisms for \emph{experimental design}. 

In the classical {\em experimental design} setting,
an  experimenter  \E\ 
%with  a budget $B$ 
has access to a population of $n$ potential experiment subjects $i\in \{1,\ldots,n\}$,  each associated with a vector of features $x_i\in\reals^d$.
%as well as a cost $c_i>0$.
Conducting an experiment with subject $i$  reveals an unknown value $y_i\in \reals$ to \E. \E\ typically assumes some 
hypothetical relationship between  $x_i$'s and $y_i$'s, \emph{e.g.},  $y_i \approx  \T{\beta} x_i$, and estimates 
$\beta$ from experiments, \emph{e.g.}, through linear regression. 
%conducting the experiments and obtaining the measurements $y_i$ allows 
%\E\ can estimate  $\beta$. 
As a proxy for various practical constraints, \E{} may select only a subset of subjects on which to conduct the experiment. 
%\E 's  goal is to select which experiments to conduct, subject to her budget constraint.
%, to obtain the best estimate possible for $\beta$.

We initiate the study of budgeted mechanisms for experimental design. In this setting,  \E{} has a budget $B$.  
Each subject $i$ declares an associated cost $c_i >0$ to be part of the experiment, and must be paid at least her cost.  In particular,  the {\em  Experimental Design Problem} (\SEDP) is to find a set $S$ of subjects for the experiment that maximizes $V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i})$ under the constraint $\sum_{i\in S}c_i\leq B$; our objective function corresponds to  the information gain in  parameter  $\beta$ that is learned through linear regression methods, and is related to the so-called $D$-optimality criterion.  Further,  the subjects  are \emph{strategic} and may lie about their costs. Thus, we need to design a 
mechanism for \SEDP{} with suitable properties. 

We present a deterministic, polynomial time, budget feasible mechanism scheme, that is approximately truthful and yields a constant factor  approximation to \EDP. In particular, for any small $\delta>0$ and $\varepsilon>0$, we can construct a $(12.98\,,\varepsilon)$-approximate mechanism that is $\delta$-truthful and runs in polynomial time in both $n$ and  $\log\log\frac{B}{\epsilon\delta}$.
By applying previous work on budget feasible mechanisms with submodular objective, one could {\em only} have derived either an exponential time deterministic mechanism or a randomized  polynomial time mechanism. Our mechanism yields a constant factor ($\approx 12.68$) approximation, and we show that no truthful, budget-feasible algorithms are possible within a factor $2$ approximation.  We also show how to generalize our approach to a wide class of learning problems, beyond linear regression.