intro.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

There is a mature area of experimental design, where the setting is as follows. 
There is an {\em experimenter}  \E\ with access to a population of $n$ members. 
Each member $i\in  \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, 
known to the experimenter. 
\E\ wishes to perform an experiment that measues certain inherent property of the members: the outcome for a member $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed.
Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, i.e.,  $y_i \approx  \T{\beta} x_i$., and the experiment lets \E\ derive some estimate of  $\beta$.   
Experimental design scenario above has many applications, from medical testing to marketing research and others. 
There is a rich literature about various estimation methods.  There is also an extensive theory of how to sample from 
the population if there is some limited number of experiments \E\ is allowed, so the estimation process returns $\beta$
that approximates the true parameter of the underlying population. 

We depart from this classical set up, view experimental design in a strategic setting, and  study mechanism design issues. 
We don't view the experiment as being manipulated and hence the outcomes are considered precise.\footnote{Thus experiments of our interest statistically significant ones where each experiment provides a reliable outcome.}  However, 
normally, there is a cost $c_i$ associated with testing  
the member $i$ which varies from member to member. This may be viewed as the  
cost member $i$ incurs to be tested, and hence $i$ needs to be reimbursed; or, it might be viewed as the incentive for $i$
to participate in the experiment; or, it might be the inherent value of the data.  This economic aspect has always been inherent in experimental design, and experimenters often work within strict budgets and design creative incentives. However, we are not aware of principled study of this setting from a strategic view. 


Our contributions are as follows.
\begin{itemize}
\item
We formulate the problem of experimental design subject to a given budget, in presence of strategic agents who specify their costs. In particular, we focus on linear regression. This is naturally viewed  as a budget feasible mechanism design problem with a sophisticated objective function that is related to the covariance of the $x_i$'s.  The problem is as follows: FILL IN.
Objective function comes from entropy.. FILL IN.

\item
There are several recent results in budget feasible mechanisms. In particular, there are randomized mechanisms that 8 opt any sub modular function. We can show that our formulation above also yields a sub modular function.. No deterministic ... We present the first known polynomial time algorithm for EDP with approximation ratio.... FILL IN. 

\item
We extend this study of the experimental design in general, beyond the basic regression problem. Again, the same insight
of data entropy, we can study several experiment design problems in their strategic setting as budget feasible mechanism design with a suitable objective function that is sub modular. This immediately gives
\end{itemize}

From a technical perspective, the crux is: FILL IN. 


We leave several problems open. 


\junk{

\begin{itemize}
    \item already existing field of experiment design: survey-like setup, what
    are the best points to include in your experiment? Measure of the
    usefulness of the data: variance-reduction or entropy-reduction.
    \item nowadays, there is also a big focus on purchasing data: paid surveys,
    mechanical turk, etc. that add economic aspects to the problem of
    experiment design
    \item recent advances (Singer, Chen) in the field of budgeted mechanisms
    \item we study ridge regression, very widely used in statistical learning,
    and treat it as a problem of budgeted experiment design
    \item we make the following contributions: ...
    \item extension to a more general setup which includes a wider class of
    machine learning problems
\end{itemize}

}


\input{related}