muthu intro

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-03 14:32:26 -0700
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-11-03 14:32:26 -0700
commit: 63a005331f8ecf215e293c4fb644fe45ac1d5ea5 (patch)
tree: cf44b0802dfd287cba8bb497b0973b98395f0213 /intro.tex
parent: 048bfda6446af58b5249ca9125771a02f5414be3 (diff)
download: recommendation-63a005331f8ecf215e293c4fb644fe45ac1d5ea5.tar.gz
1 files changed, 28 insertions, 0 deletions
diff --git a/intro.tex b/intro.tex
index 72cdd96..9b1be51 100644
--- a/intro.tex
+++ b/intro.tex
@@ -1,3 +1,31 @@
+There is a mature area of experimental design, where the setting is as follows. 
+There is an {\em experimenter}  \E\ with access to a population of $n$ members. 
+Each member $i\in  \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, 
+known to the experimenter. 
+\E\ wishes to perform an experiment:  the outcome for a member $i$ is denoted $y_i$, which is unknown to \E\ before the experiment is performed.  Typically, \E\ has a hypothesis of the relationship between $x_i$'s and $y_i$'s, such as, say linear, i.e.,  $y_i \approx  \T{\beta} x_i$., and the experiment lets \E\ derive some estimate of \T{\beta}$. 
+ 
+
+
+
+
+More precisely, putting cost considerations aside, suppose that an experimenter wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$,  related to the experiment features $x_i$ through a linear function, \emph{i.e.},
+\begin{align}
+     y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model}
+\end{align}
+where $\beta$ a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random  variables with zero mean and variance $\sigma^2$. 
+
+The purpose of these experiments is to allow the experimenter to estimate the model $\beta$. In particular, assuming Gaussian noise, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and
+$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, 
+\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\
+& = (\T{X_S}X_S)^{-1}X_S^Ty_S\label{leastsquares}\end{align} 
+%The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. 
+
+
+
+
+
+
+
 
 \begin{itemize}
     \item already existing field of experiment design: survey-like setup, what
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-03 14:32:26 -0700
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-11-03 14:32:26 -0700
commit	63a005331f8ecf215e293c4fb644fe45ac1d5ea5 (patch)
tree	cf44b0802dfd287cba8bb497b0973b98395f0213 /intro.tex
parent	048bfda6446af58b5249ca9125771a02f5414be3 (diff)
download	recommendation-63a005331f8ecf215e293c4fb644fe45ac1d5ea5.tar.gz