diff options
| -rw-r--r-- | definitions.tex | 1 | ||||
| -rw-r--r-- | intro.tex | 28 |
2 files changed, 29 insertions, 0 deletions
diff --git a/definitions.tex b/definitions.tex index 0972a3e..effdb4e 100644 --- a/definitions.tex +++ b/definitions.tex @@ -23,3 +23,4 @@ \newcommand{\thibaut}[1]{\textcolor{blue}{Thibaut: #1}} \newcommand{\T}[1]{#1^T} \newcommand{\EDP}{EDP} +\newcommand{\E}{{\tt E}} @@ -1,3 +1,31 @@ +There is a mature area of experimental design, where the setting is as follows. +There is an {\em experimenter} \E\ with access to a population of $n$ members. +Each member $i\in \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, +known to the experimenter. +\E\ wishes to perform an experiment: the outcome for a member $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed. Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, i.e., $y_i \approx \T{\beta} x_i$., and the experiment lets \E\ derive some estimate of \T{\beta}$. + + + + + +More precisely, putting cost considerations aside, suppose that an experimenter wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$, related to the experiment features $x_i$ through a linear function, \emph{i.e.}, +\begin{align} + y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model} +\end{align} +where $\beta$ a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random variables with zero mean and variance $\sigma^2$. + +The purpose of these experiments is to allow the experimenter to estimate the model $\beta$. In particular, assuming Gaussian noise, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and +$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, +\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\ +& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align} +%The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. + + + + + + + \begin{itemize} \item already existing field of experiment design: survey-like setup, what |
