chaloner citation

author: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2013-07-06 15:23:34 -0700
committer: Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> 2013-07-06 15:23:34 -0700
commit: 49995b4aecef20bd138dea3bf66d55dfccc8164d (patch)
tree: f6ba48f71044402349dc67ef50ffa41370611743 /problem.tex
parent: d71b6f325ded0ca101976e6b5c3b0fa72be4bfbd (diff)
download: recommendation-49995b4aecef20bd138dea3bf66d55dfccc8164d.tar.gz
1 files changed, 9 insertions, 9 deletions
diff --git a/problem.tex b/problem.tex
index 0b10107..2e4db20 100644
--- a/problem.tex
+++ b/problem.tex
@@ -19,14 +19,14 @@ etc.). The magnitude of the coefficient $\beta_i$ captures the effect that featu
 The purpose of these experiments is to allow \E\  to estimate the model $\beta$. In particular,
  assume that the experimenter  \E\ has a {\em prior}
 distribution on $\beta$, \emph{i.e.},  $\beta$ has a multivariate normal prior
-with zero mean  and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance and the prior on $\beta$). 
+with zero mean  and covariance $\sigma^2R^{-1}\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). 
 Then, \E\ estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: 
 \begin{align}
     \hat{\beta} = \argmax_{\beta\in\reals^d} \prob(\beta\mid y_S) =\argmin_{\beta\in\reals^d} \big(\sum_{i\in S} (y_i - \T{\beta}x_i)^2
     + \T{\beta}R\beta\big) = (R+\T{X_S}X_S)^{-1}X_S^Ty_S \label{ridge}
 \end{align}
-where $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ is the matrix of experiment features and
-$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements. 
+where the last equality is obtained by setting  $\nabla_{\beta}\prob(\beta\mid y_S)$ to zero and solving the resulting linear system; in \eqref{ridge}, $X_S\defeq[x_i]_{i\in S}\in \reals^{|S|\times d}$ is the matrix of experiment features and
+$y_S\defeq[y_i]_{i\in S}\in\reals^{|S|}$ are the observed measurements. 
 This optimization, commonly known as \emph{ridge regression}, includes an additional quadratic penalty term compared to the standard least squares estimation.
 % under \eqref{model}, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and
 %$y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, 
@@ -46,17 +46,17 @@ which is the entropy reduction on $\beta$ after the revelation of $y_S$ (also kn
 Hence, selecting a set of experiments $S$ that
 maximizes $V(S)$ is equivalent to finding the set of experiments that minimizes
 the uncertainty on $\beta$, as captured by the entropy reduction of its estimator.
-Under the linear model \eqref{model}, and the Gaussian prior, the information gain takes the form:
+Under the linear model \eqref{model}, and the Gaussian prior, the information gain takes the following form (see, \emph{e.g.}, \cite{chaloner1995bayesian}):
 \begin{align}
- V(S) &= \frac{1}{2}\log\det(R+ \T{X_S}X_S) \label{dcrit} %\\
+ I(\beta;y_S)&= \frac{1}{2}\log\det(R+ \T{X_S}X_S) - \frac{1}{2}\log\det R\label{dcrit} %\\
 \end{align}
-This value function is known in the experimental design literature as the
+Maximizing $I(\beta;y_S)$ is therefore equivalent to maximizing $\log\det(R+ \T{X_S}X_S)$, which is  known in the experimental design literature as the Bayes
 $D$-optimality criterion
 \cite{pukelsheim2006optimal,atkinson2007optimum,chaloner1995bayesian}. 
 
-Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from the noise terms $\varepsilon_i$). 
-In particular, $\hat{\beta}$ has %mean $\beta$% (\emph{i.e.}, it is an \emph{unbiased estimator}) and 
-covariance $(R+\T{X_S}X_S)^{-1}$. As such, maximizing $V(S)$ can alternatively be seen as a means of reducing the uncertainty on estimator $\hat{\beta}$ my minimizing the product of the eigenvalues of its covariance.
+Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from the noise terms $\varepsilon_i$ and the prior on $\beta$). 
+In particular, $\hat{\beta}$ has 
+covariance $\sigma^2(R+\T{X_S}X_S)^{-1}$. As such, maximizing $I(\beta;y_S)$ can alternatively be seen as a means of reducing the uncertainty on estimator $\hat{\beta}$ my minimizing the product of the eigenvalues of its covariance.
 
 %An alternative interpretation, given that $(R+ \T{X_S}X_S)^{-1}$ is the covariance of the estimator $\hat{\beta}$, is that it tries to minimize the 
 %which is indeed a function of the covariance matrix $(R+\T{X_S}X_S)^{-1}$.
author	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2013-07-06 15:23:34 -0700
committer	Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>	2013-07-06 15:23:34 -0700
commit	49995b4aecef20bd138dea3bf66d55dfccc8164d (patch)
tree	f6ba48f71044402349dc67ef50ffa41370611743 /problem.tex
parent	d71b6f325ded0ca101976e6b5c3b0fa72be4bfbd (diff)
download	recommendation-49995b4aecef20bd138dea3bf66d55dfccc8164d.tar.gz