There is a mature area of experimental design, where the setting is as follows. 
There is an {\em experimenter}  \E\ with access to a population of $n$ members. 
Each member $i\in  \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, 
known to the experimenter. 
\E\ wishes to perform an experiment:  the outcome for a member $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed.  Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, i.e.,  $y_i \approx  \T{\beta} x_i$., and the experiment lets \E\ derive some estimate of \T{\beta}$. 
 

More precisely, putting cost considerations aside, suppose that an experimenter wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$,  related to the experiment features $x_i$ through a linear function, \emph{i.e.},
\begin{align}
     y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model}
\end{align}
where $\beta$ a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random  variables with zero mean and variance $\sigma^2$. 

The purpose of these experiments is to allow the experimenter to estimate the model $\beta$. In particular, assuming Gaussian noise, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and
$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, 
\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\
& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align} 
%The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. 


\begin{itemize}
    \item already existing field of experiment design: survey-like setup, what
    are the best points to include in your experiment? Measure of the
    usefulness of the data: variance-reduction or entropy-reduction.
    \item nowadays, there is also a big focus on purchasing data: paid surveys,
    mechanical turk, etc. that add economic aspects to the problem of
    experiment design
    \item recent advances (Singer, Chen) in the field of budgeted mechanisms
    \item we study ridge regression, very widely used in statistical learning,
    and treat it as a problem of budgeted experiment design
    \item we make the following contributions: ...
    \item extension to a more general setup which includes a wider class of
    machine learning problems
\end{itemize}

\input{related}