From 8c09cfd7da709aab03fb004b58ecd8e1eb4fb553 Mon Sep 17 00:00:00 2001 From: Stratis Ioannidis Date: Mon, 11 Feb 2013 09:37:09 -0800 Subject: muthu --- problem.tex | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) (limited to 'problem.tex') diff --git a/problem.tex b/problem.tex index 668cd57..3ad3270 100644 --- a/problem.tex +++ b/problem.tex @@ -13,7 +13,7 @@ For example, each $i$ may correspond to a human subject; the feature vector $x_i$ may correspond to a normalized vector of her age, weight, gender, income, \emph{etc.}, and the measurement $y_i$ may capture some biometric information (\emph{e.g.}, her red cell blood count, a genetic marker, -etc.). The magnitude of the coefficient $\beta_i$ captures the effect that feature $i$ has on the measured variable, and its sign captures whether the corellation is positive or negative. +etc.). The magnitude of the coefficient $\beta_i$ captures the effect that feature $i$ has on the measured variable, and its sign captures whether the correlation is positive or negative. The purpose of these experiments is to allow \E\ to estimate the model $\beta$. In particular, @@ -35,12 +35,15 @@ This optimization, commonly known as \emph{ridge regression}, includes an additi %The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from -the noise terms $\varepsilon_i$). In particular, $\hat{\beta}$ has %mean $\beta$% (\emph{i.e.}, it is an \emph{unbiased estimator}) and - covariance -$(R+\T{X_S}X_S)^{-1}$. +the noise terms $\varepsilon_i$). +%In particular, $\hat{\beta}$ has %mean $\beta$% (\emph{i.e.}, it is an \emph{unbiased estimator}) and +% covariance +%$(R+\T{X_S}X_S)^{-1}$. Let $V:2^\mathcal{N}\to\reals$ be a \emph{value function}, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The classical experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$. -A variety of different value functions are used in literature~\cite{pukelsheim2006optimal}; almost all make use of the covariance $(R+\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. A value function that has natural advantages is the \emph{information gain}: %\emph{$D$-optimality criterion}: %which yields the following optimization problem +A variety of different value functions are used in literature~\cite{pukelsheim2006optimal}. +%; almost all make use of the covariance $(R+\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. +A value function that has natural advantages is the \emph{information gain}: %\emph{$D$-optimality criterion}: %which yields the following optimization problem \begin{align} V(S)= I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S). \label{informationgain} \end{align} @@ -52,7 +55,7 @@ Under the linear model \eqref{model}, and the Gaussian prior, the information ga \begin{align} V(S) &= \frac{1}{2}\log\det(R+ \T{X_S}X_S) \label{dcrit} %\\ \end{align} -which is indeed a function of the covariance matrix $(R+\T{X_S}X_S)^{-1}$. +%which is indeed a function of the covariance matrix $(R+\T{X_S}X_S)^{-1}$. %defined as $-\infty$ when $\mathrm{rank}(\T{X_S}X_S)