diff options
| author | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-10-31 10:26:41 -0700 |
|---|---|---|
| committer | Stratis Ioannidis <stratis@stratis-Latitude-E6320.(none)> | 2012-10-31 10:26:41 -0700 |
| commit | 166bcc95424910868c54b091eb94573ce3ffef0f (patch) | |
| tree | e4fc0a48833eae2952a32d503b3fdb7f7ad5059c /general.tex | |
| parent | 1bd86f146d222f7d50746a8db39f4ffa12c3fd19 (diff) | |
| download | recommendation-166bcc95424910868c54b091eb94573ce3ffef0f.tar.gz | |
changes
Diffstat (limited to 'general.tex')
| -rw-r--r-- | general.tex | 22 |
1 files changed, 21 insertions, 1 deletions
diff --git a/general.tex b/general.tex index c90a439..e7a955f 100644 --- a/general.tex +++ b/general.tex @@ -1,5 +1,25 @@ \subsection{Bayesian Experimental Design} -TODO: Introduce prior with covariance $\sigma^2 R$. Change in entropy/ mutual information is then ... So our scheme can be seen as Baysian prior with $R=I_d$. Extension of our main theorem. +In this section, we extend our results to Bayesian experimental design \cite{chaloner1995bayesian}. In particular, we show that our choice of objective function \eqref{...} has a natural interpration in this context, further motivating its selection, and Theorem~\ref{...} has a natural generalization to this context. + +In the Bayesian setting, it is assumed that the experimenter has a prior distribution on $\beta$: in particular, $\beta$ is assumed to be sampled from a multivariate normal distribution with zero mean and covariance $\sigma^2R\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). +The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: FIX! +\begin{displaymath} + \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2 + + \sum_i \norm{R\beta}_2^2 +\end{displaymath} +This optimization, commonly known as \emph{ridge regression}, includes an additional penalty term compared to the least squares estimation \eqref{leastsquares}. +Let $\entropy(\beta)$ be the entropy of $\beta$ under this distribution, and $\entropy(\beta\mid y_S)$ the entropy of $\beta$ conditioned on the experiment outcomes $Y_S$, for some $S\subseteq \mathcal{N}$. In this setting, a natural objective to select a set of experiments $S$ that maximizes her \emph{information gain}: +$$ I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S). $$ + +Assuming normal noise variables, the information gain is equal (upto a constant) to the following value function \cite{chaloner1995bayesian}: +\begin{align} +V(S) = \frac{1}{2}\log\det(R + \T{X_S}X_S)\label{bayesianobjective} +\end{align} +Our objective \eqref{,,,} clearly follows from \eqref{bayesianobjective} by setting $R=I_d$. Hence, our optimization can be interpreted as a maximization of the information gain when the prior distribution has a covariance $\sigma^2 I_d$, and the experimenter is solving a ridge regression problem with penalty term $\norm{x}_2^2$. + +Moreover, our results can be extended to the general Bayesian case, by replacing $I_d$ with the positive semidefinite matrix $R$: + +TODO: state theorem, discuss dependence on $\det R$. \subsection{Beyond Linear Models} TODO: Independent noise model. Captures models such as logistic regression, classification, etc. Arbitrary prior. Show that change in the entropy is submodular (cite Krause, Guestrin). |
