summaryrefslogtreecommitdiffstats
path: root/intro.tex
blob: 22079856e7419966167c011f14d97359964e042e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
In the classic setting of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum},
 an {\em experimenter}  \E\ has access to a population of $n$ potential experiment subjects. 
Each subject $i\in  \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, 
known to the experimenter. 
\E\ wishes to perform an experiment that measures a certain inherent property of the subjects: the outcome for a subject $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed.
Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, \emph{i.e.},  $y_i \approx  \T{\beta} x_i$; conducting the experiments and obtaining the measurements $y_i$ lets \E\ derive an estimate  $\beta$. %The goal of experimental design amounts to determining which subjects to experiment upon to produce the best possible such estimate.
 
The above experimental design scenario above has many applications, from medical testing to marketing research and others. 
There is a rich literature about various estimation, as well as for means for quantifying the quality of the produced estimate  \cite{pukelsheim2006optimal}.  There is also an extensive theory on how to select subjects 
if \E\ can conduct only a limited number of experiments, so the estimation process returns $\beta$
that approximates the true parameter of the underlying population \cite{ginebra2007measure,le1996comparison,chaloner1995bayesian,boyd2004convex}. 

We depart from this classical set up by viewing experimental design in a strategic setting, and by studying mechanism design issues. 
In our setup, experiments cannot be manipulated and hence measurements are considered precise.\footnote{Thus, the experiments of our interest are statistically significant, ones where each experiment provides a reliable outcome.}  However, there
is a cost $c_i$ associated with experimenting on
subject $i$ which varies from subject to subject. This may be viewed as the  
cost subject $i$ incurs when tested, and hence $i$ needs to be reimbursed; or, it might be viewed as the incentive for $i$
to participate in the experiment; or, it might be the inherent value of the data.  When subjects are strategic, they may have an incentive to mis-report their cost. This economic aspect has always been inherent in experimental design: experimenters often work within strict budgets and design creative incentives. However, we are not aware of principled study of this setting from a strategic point of view. 


Our contributions are as follows.
\begin{itemize}
\item
We formulate the problem of experimental design subject to a given budget, in presence of strategic agents who specify their costs. In particular, we focus on linear regression. This is naturally viewed  as a budget feasible mechanism design problem with a sophisticated objective function that is related to the covariance of the $x_i$'s. In particolar the \textsc{ExperimentalDesignProblem} is as follows: the experimenter \E\ wishes to maximize 
\begin{align}V(S) = \log\det(I_d+\sum_{i\in S}x_i\T{x_i}) \label{obj}\end{align}
subject to the budget constraint $\sum_{i\in S}c_i\leq B$, where $B$ is \E's budget. 
The objective function is motivated from the so-called $D$-objective criterion; in particular, it captures the reduction in the entropy of $\beta$ when the latter is learned  through ridge regression.

\item
There are several recent results in budget feasible mechanisms \cite{singer-mechanisms,chen,singer-influence,bei2012budget,dobz2011-mechanisms}. The above objective is submodular, and there is a randomized, 7.91-approximate mechanism for maximizing a general submodular function that is universally truthful (\emph{i.e.}, it is sampled from a distribution among truthful mechanisms). Though no deterministic, truthful constant approximation mechanism  for general submodular functions exists, constant-approximation algorithms for specific problems are known \cite{singer-mechanisms,chen, singer-influence}. In this work, we extend the set of problems for which such approximations are known by presenting the first known polynomial time algorithm for EDP with approximation ratio 18 \ldots, as well as a lower bound of $2$. 


\item
We extend this study of the experimental design in general, beyond the basic regression problem. Again, the same insight
of data entropy, we can study several experiment design problems in their strategic setting as budget feasible mechanism design with a suitable objective function that is submodular. This immediately gives
\end{itemize}

From a technical perspective, we extend the provide a convex relaxation to the above problem and show that it is within constant approximation from its so-called multilinear relaxation. This allows us to implement the approach used by Chen \emph{et al.}~\cite{chen} and Singer~\cite{singer-influence} to show deterministic constant approximation algorithms for \textsc{Knapsack} and \textsc{Coverage}, respectively.


We leave several problems open. 


\junk{

\begin{itemize}
    \item already existing field of experiment design: survey-like setup, what
    are the best points to include in your experiment? Measure of the
    usefulness of the data: variance-reduction or entropy-reduction.
    \item nowadays, there is also a big focus on purchasing data: paid surveys,
    mechanical turk, etc. that add economic aspects to the problem of
    experiment design
    \item recent advances (Singer, Chen) in the field of budgeted mechanisms
    \item we study ridge regression, very widely used in statistical learning,
    and treat it as a problem of budgeted experiment design
    \item we make the following contributions: ...
    \item extension to a more general setup which includes a wider class of
    machine learning problems
\end{itemize}

}



\input{related}