abstract.tex


1
2
3
4
5
6
7
8

We initiate the study of mechanisms for \emph{experimental design}. In this setting,
an  experimenter with  a budget $B$ has access to a population of $n$ potential experiment subjects $i\in 1,\ldots,n$,  each associated with a vector of features $x_i\in\reals^d$ as well as a cost $c_i>0$.
Conducting an experiment with subject $i$  reveals an unknown value $y_i\in \reals$ to the experimenter. Assuming a linear relationship between $x_i$'s and $y_i$'s, \emph{i.e.},  $y_i \approx  \T{\beta} x_i$, conducting the experiments and obtaining the measurements $y_i$ allows the experimenter to estimate  $\beta$. The experimenter's goal is to select which experiments to conduct, subject to her budget constraint, to obtain the best estimate possible. 

We study this problem when subjects are \emph{strategic} and may lie about their costs. In particular, we formulate the {\em Experimental Design Problem} (\EDP) as finding a set $S$ of subjects that maximize $V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i})$ under the constraint $\sum_{i\in S}c_i\leq B$; our objective function corresponds to  the information gain in  $\beta$ when it is learned through linear regression methods, and is related to the so-called $D$-optimality criterion. We present the first known deterministic, polynomial time truthful mechanism for \EDP{}, yielding a constant factor ($\approx 19.68$) approximation, and show that no truthful algorithms are possible within a factor 2 approximation. Moreover, we show that a wider class of learning problems admits a polynomial time universally truthful (\emph{i.e.}, randomized) mechanism, also within a constant factor approximation.