D-optimality

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-10-30 17:58:56 -0700
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2012-10-30 17:58:56 -0700
commit: 60efcc1d97c0cad6446db44dd1b25baf67c57566 (patch)
tree: 0434b1f5ef6fcf4e631f46f8fd5d58ccc62ab4b2
parent: 2d54b5d27ffd2782aec0bab8200b5b9e55585805 (diff)
download: recommendation-60efcc1d97c0cad6446db44dd1b25baf67c57566.tar.gz
3 files changed, 53 insertions, 3 deletions
diff --git a/definitions.tex b/definitions.tex
index e603e5a..03cba22 100644
--- a/definitions.tex
+++ b/definitions.tex
@@ -8,8 +8,8 @@
 %\newcommand*{\defeq}{\stackrel{\text{def}}{=}}
 \newcommand*{\defeq}{\equiv}
 \newcommand{\var}{\mathop{\mathrm{Var}}}
-\newcommand{\condexp}[2]{\mathop{\mathbb{E}}\left[#1|#2\right]}
-\newcommand{\expt}[1]{\mathop{\mathbb{E}}\left[#1\right]}
+\newcommand{\condexp}[2]{{\mathbb{E}}[#1|#2]}
+\newcommand{\expt}[1]{{\mathbb{E}}[#1]}
 \newcommand{\norm}[1]{\lVert#1\rVert}
 \newcommand{\tr}[1]{#1^*}
 \newcommand{\ip}[2]{\langle #1, #2 \rangle}
@@ -18,6 +18,7 @@
 \DeclareMathOperator*{\argmax}{arg\,max}
 \DeclareMathOperator*{\argmin}{arg\,min}
 \newcommand{\reals}{\ensuremath{\mathbbm{R}}}
+\newcommand{\prob}{\ensuremath{\mathrm{Pr}}}
 \newcommand{\stratis}[1]{\textcolor{red}{Stratis: #1}}
 \newcommand{\thibaut}[1]{\textcolor{blue}{Thibaut: #1}}
 \newcommand{\T}[1]{#1^T}
diff --git a/notes.bib b/notes.bib
index 8402384..fd66bea 100644
--- a/notes.bib
+++ b/notes.bib
@@ -1,3 +1,20 @@
+@book{pukelsheim2006optimal,
+  title={Optimal design of experiments},
+  author={Pukelsheim, F.},
+  volume={50},
+  year={2006},
+  publisher={Society for Industrial Mathematics}
+}
+
+
+@book{atkinson2007optimum,
+  title={Optimum experimental designs, with SAS},
+  author={Atkinson, A.C. and Donev, A.N. and Tobias, R.D.},
+  year={2007},
+  publisher={Oxford University Press New York}
+}
+
+
 @article{inverse,
      jstor_articletype = {research-article},
      title = {On the Inverse of the Sum of Matrices},
diff --git a/problem.tex b/problem.tex
index db7108b..d7cf38c 100644
--- a/problem.tex
+++ b/problem.tex
@@ -1,4 +1,37 @@
+\subsection{Optimal Experimental Design}
+
+The theory of experimental design \cite{pukelsheim2006optimal,atkinson2007optimum}  studies how an experimenter should select the parameters of a set of experiments she is about to conduct. In general, the optimality of a particular design depends on the purpose of the experiment, \emph{i.e.}, the quantity the experimenter is trying to learn or the hypothesis she is trying to validate. Due to their ubiquity in statistical analysis, a large literature on the subject focuses on learning \emph{linear models}, whereby the experimenter wishes to  fit  a linear map to the data she has collected.
+
+More precisely, putting cost considerations aside, suppose that an experimenter wishes to conduct $k$ among $n$ possible experiments. Each experiment $i\in\mathcal{N}\defeq \{1,\ldots,n\}$ is associated with a set of parameters (or features) $x_i\in \reals^d$, normalized so that $\|x_i\|_2\leq 1$. Denote by $S\subseteq \mathcal{N}$, where $|S|=k$, the set of experiments selected; upon its execution, experiment $i\in S$ reveals an output variable (the ``measurement'') $y_i$,  related to the experiment features $x_i$ through a linear function, \emph{i.e.},
+\begin{align}
+     y_i = \T{\beta} x_i + \varepsilon_i,\quad\forall i\in\mathcal{N},\label{model}
+\end{align}
+where $\beta$ a vector in $\reals^d$, commonly referred to as the \emph{model}, and $\varepsilon_i$ (the \emph{measurement noise}) are independent, normally distributed random  variables with zero mean and variance $\sigma^2$. 
+
+The purpose of these experiments is to allow the experimenter to estimate the model $\beta$. In particular, assuming gaussian noise, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and
+$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, 
+\begin{align*} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \\
+& = (\T{X_S}X_S)^{-1}X_S^Ty_S\end{align*} 
+%The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. 
+Note that the estimator $\hat{beta}$ is a linear map of $\hat{y}_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$. In particular, $\hat{\beta}$ has mean $\beta$ (\emph{i.e.}, it is unbianced) and covariance $(\T{X_S}X_S)^{-1}$.
+
+The above covariance matrix is used in experimental design to determine how informative a set of experiments $S$ is. Though many different measures of information exist~(see \cite{pukelsheim2006optimal}), the most commonly used is the following: %so-called \emph{D-optimality criterion}: %which yields the following optimization problem
+\begin{align*}
+\text{Maximize}\quad V_D(S) &= \frac{1}{2}\log\det \T{X_S}X_S \\
+\text{subj. to}\quad |S|&\leq k
+\end{align*}
+The objective of the above optimization, called the \emph{D-optimality criterion}, is equal (up to a costant) to the negative of the entropy of $\hat{\beta}$. Selecting a set of experiments $S$ that maximizes $V_D(S)$ is equivalent to finding the set of experiments that minimizes the uncertainty on $\beta$, as captured by the entropy of its estimator.
+
+
+
+\subsection{Budgeted Mechanism Design}
+
+
+\begin{comment}
 \subsection{Experimental Design}
+
+
+
 In the context of experimental design, an \emph{experiment} is a random variable $E$ sampled from a distribution $P_\beta$, where $\beta\in \Omega$ is an unknown parameter. An experimenter wishes to learn parameter $\beta$, and can chose among a set of possible different experiments, all of which have distributions parametrized by the same $\beta$. 
 
 The problem of optimal experimental design amounts to determining an experiment that maximizes the information revealed about parameter $\beta$. 
@@ -60,7 +93,6 @@ problem. Explain the goals:
 \thibaut{TODO Explain what is already known: it is ok when the function is submodular. When
 should we introduce the notion of submodularity?}
 
-\begin{comment}
 \subsection{Linear and Ridge Regression}
 
 Linear regression, a true workhorse of statistical analysis, fits a linear function to a set of observable input and output variables. In particular, consider a set of $n$ value pairs $(x_i,y_i)$, $i\in \mathcal{N} =\{1,\ldots,n\}$. The variables $x_i\in \reals^d$ are usually referred to as \emph{input} variables or \emph{covariates} and $y_i\in \reals$ are referred to as  \emph{output} or \emph{response} variables. For example, the former could be the age, weight, or height of a person, while the latter can be their propensity to contract a disease. Assume that input and output variables are related through the following relationship:
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-10-30 17:58:56 -0700
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2012-10-30 17:58:56 -0700
commit	60efcc1d97c0cad6446db44dd1b25baf67c57566 (patch)
tree	0434b1f5ef6fcf4e631f46f8fd5d58ccc62ab4b2
parent	2d54b5d27ffd2782aec0bab8200b5b9e55585805 (diff)
download	recommendation-60efcc1d97c0cad6446db44dd1b25baf67c57566.tar.gz