summaryrefslogtreecommitdiffstats
path: root/intro.tex
diff options
context:
space:
mode:
Diffstat (limited to 'intro.tex')
-rw-r--r--intro.tex44
1 files changed, 30 insertions, 14 deletions
diff --git a/intro.tex b/intro.tex
index 9f98137..118de7f 100644
--- a/intro.tex
+++ b/intro.tex
@@ -2,30 +2,42 @@ There is a mature area of experimental design, where the setting is as follows.
There is an {\em experimenter} \E\ with access to a population of $n$ members.
Each member $i\in \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$,
known to the experimenter.
-\E\ wishes to perform an experiment: the outcome for a member $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed. Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, i.e., $y_i \approx \T{\beta} x_i$., and the experiment lets \E\ derive some estimate of $\T{\beta}$.
-
+\E\ wishes to perform an experiment that measues certain inherent property of the members: the outcome for a member $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed.
+Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, i.e., $y_i \approx \T{\beta} x_i$., and the experiment lets \E\ derive some estimate of $\beta$.
+Experimental design scenario above has many applications, from medical testing to marketing research and others.
+There is a rich literature about various estimation methods. There is also an extensive theory of how to sample from
+the population if there is some limited number of experiments \E\ is allowed, so the estimation process returns $\beta$
+that approximates the true parameter of the underlying population.
+We depart from this classical set up, view experimental design in a strategic setting, and study mechanism design issues.
+We don't view the experiment as being manipulated and hence the outcomes are considered precise.\footnote{Thus experiments of our interest statistically significant ones where each experiment provides a reliable outcome.} However,
+normally, there is a cost $c_i$ associated with testing
+the member $i$ which varies from member to member. This may be viewed as the
+cost member $i$ incurs to be tested, and hence $i$ needs to be reimbursed; or, it might be viewed as the incentive for $i$
+to participate in the experiment; or, it might be the inherent value of the data. This economic aspect has always been inherent in experimental design, and experimenters often work within strict budgets and design creative incentives. However, we are not aware of principled study of this setting from a strategic view.
+Our contributions are as follows.
+\begin{itemize}
+\item
+We formulate the problem of experimental design subject to a given budget, in presence of strategic agents who specify their costs. In particular, we focus on linear regression. This is naturally viewed as a budget feasible mechanism design problem with a sophisticated objective function that is related to the covariance of the $x_i$'s. The problem is as follows: FILL IN.
+Objective function comes from entropy.. FILL IN.
-More precisely, putting cost considerations aside, suppose that an experimenter wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.},
-\begin{align}
- y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model}
-\end{align}
-where $\beta$ a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random variables with zero mean and variance $\sigma^2$.
-
-The purpose of these experiments is to allow the experimenter to estimate the model $\beta$. In particular, assuming Gaussian noise, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and
-$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements,
-\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\
-& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align}
-%The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$.
-
+\item
+There are several recent results in budget feasible mechanisms. In particular, there are randomized mechanisms that 8 opt any sub modular function. We can show that our formulation above also yields a sub modular function.. No deterministic ... We present the first known polynomial time algorithm for EDP with approximation ratio.... FILL IN.
+\item
+We extend this study of the experimental design in general, beyond the basic regression problem. Again, the same insight
+of data entropy, we can study several experiment design problems in their strategic setting as budget feasible mechanism design with a suitable objective function that is sub modular. This immediately gives
+\end{itemize}
+From a technical perspective, the crux is: FILL IN.
+We leave several problems open.
+\junk{
\begin{itemize}
\item already existing field of experiment design: survey-like setup, what
@@ -42,4 +54,8 @@ $y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements,
machine learning problems
\end{itemize}
+}
+
+
+
\input{related}