summaryrefslogtreecommitdiffstats
path: root/intro.tex
diff options
context:
space:
mode:
Diffstat (limited to 'intro.tex')
-rw-r--r--intro.tex28
1 files changed, 28 insertions, 0 deletions
diff --git a/intro.tex b/intro.tex
index 72cdd96..9b1be51 100644
--- a/intro.tex
+++ b/intro.tex
@@ -1,3 +1,31 @@
+There is a mature area of experimental design, where the setting is as follows.
+There is an {\em experimenter} \E\ with access to a population of $n$ members.
+Each member $i\in \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$,
+known to the experimenter.
+\E\ wishes to perform an experiment: the outcome for a member $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed. Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, i.e., $y_i \approx \T{\beta} x_i$., and the experiment lets \E\ derive some estimate of \T{\beta}$.
+
+
+
+
+
+More precisely, putting cost considerations aside, suppose that an experimenter wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.},
+\begin{align}
+ y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model}
+\end{align}
+where $\beta$ a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random variables with zero mean and variance $\sigma^2$.
+
+The purpose of these experiments is to allow the experimenter to estimate the model $\beta$. In particular, assuming Gaussian noise, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and
+$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements,
+\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\
+& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align}
+%The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$.
+
+
+
+
+
+
+
\begin{itemize}
\item already existing field of experiment design: survey-like setup, what