3 files changed, 36 insertions, 8 deletions
diff --git a/general.tex b/general.tex
index c90a439..e7a955f 100644
--- a/general.tex
+++ b/general.tex
@@ -1,5 +1,25 @@
 \subsection{Bayesian Experimental Design}
-TODO: Introduce prior with covariance $\sigma^2 R$. Change in entropy/ mutual information is then ... So our scheme can be seen as Baysian prior with $R=I_d$. Extension of our main theorem.
+In this section, we extend our results to Bayesian experimental design \cite{chaloner1995bayesian}. In particular, we show that our choice of objective function \eqref{...} has a natural interpration in this context, further motivating its selection, and Theorem~\ref{...} has a natural generalization to this context.
+
+In the Bayesian setting, it is assumed that the experimenter has a prior distribution on $\beta$: in particular, $\beta$ is assumed to be sampled from a multivariate normal distribution with zero mean  and covariance $\sigma^2R\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). 
+The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: FIX!
+\begin{displaymath}
+    \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
+    + \sum_i \norm{R\beta}_2^2
+\end{displaymath}
+This optimization, commonly known as \emph{ridge regression}, includes an additional penalty term compared to the least squares estimation \eqref{leastsquares}.
+Let $\entropy(\beta)$ be the entropy of $\beta$ under this distribution, and $\entropy(\beta\mid y_S)$ the  entropy of $\beta$ conditioned on the experiment outcomes $Y_S$, for some $S\subseteq \mathcal{N}$. In this setting, a natural objective to select a set of experiments $S$ that maximizes her \emph{information gain}:
+$$ I(\beta;y_S) = \entropy(\beta)-\entropy(\beta\mid y_S). $$
+
+Assuming normal noise variables, the information gain is  equal (upto a constant) to the following value function \cite{chaloner1995bayesian}:
+\begin{align}
+V(S)  = \frac{1}{2}\log\det(R + \T{X_S}X_S)\label{bayesianobjective}
+\end{align}
+Our objective \eqref{,,,} clearly follows from \eqref{bayesianobjective} by setting $R=I_d$. Hence, our optimization can be interpreted as a maximization of the information gain when the prior distribution has a covariance $\sigma^2 I_d$, and the experimenter is solving a ridge regression problem with penalty term $\norm{x}_2^2$. 
+
+Moreover, our results can be extended to the general Bayesian case, by replacing $I_d$ with the positive semidefinite matrix $R$:
+
+TODO: state theorem, discuss dependence on $\det R$.
 
 \subsection{Beyond Linear Models}
 TODO: Independent noise model. Captures models such as logistic regression, classification, etc. Arbitrary prior. Show that change in the entropy is submodular (cite Krause, Guestrin). 
diff --git a/notes.bib b/notes.bib
index 378eced..695b37a 100644
--- a/notes.bib
+++ b/notes.bib
@@ -6,6 +6,14 @@
   publisher={Society for Industrial Mathematics}
 }
 
+@article{chaloner1995bayesian,
+  title={Bayesian experimental design: A review},
+  author={Chaloner, K. and Verdinelli, I.},
+  journal={Statistical Science},
+  pages={273--304},
+  year={1995},
+  publisher={JSTOR}
+}
 
 @book{atkinson2007optimum,
   title={Optimum experimental designs, with SAS},
diff --git a/problem.tex b/problem.tex
index 90203c6..5fd8906 100644
--- a/problem.tex
+++ b/problem.tex
@@ -10,18 +10,18 @@ where $\beta$ a vector in $\reals^d$, commonly referred to as the \emph{model},
 
 The purpose of these experiments is to allow the experimenter to estimate the model $\beta$. In particular, assuming gaussian noise, the maximum likelihood estimator of $\beta$ is the \emph{least squares} estimator: for $X_S=[x_i]_{i\in S}\in \reals^{|S|\times d}$ the matrix of experiment features and
 $y_S=[y_i]_{i\in S}\in\reals^{|S|}$ the observed measurements, 
-\begin{align*} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \\
-& = (\T{X_S}X_S)^{-1}X_S^Ty_S\end{align*} 
+\begin{align} \hat{\beta} &=\max_{\beta\in\reals^d}\prob(y_S;\beta) =\argmin_{\beta\in\reals^d } \sum_{i\in S}(\T{\beta}x_i-y_i)^2 \nonumber\\
+& = (\T{X_S}X_S)^{-1}X_S^Ty_S \label{leastsquares}\end{align} 
 %The estimator $\hat{\beta}$ is unbiased, \emph{i.e.}, $\expt{\hat{\beta}} = \beta$ (where the expectation is over the noise variables $\varepsilon_i$). Furthermore, $\hat{\beta}$ is a multidimensional normal random variable with mean $\beta$ and covariance matrix $(X_S\T{X_S})^{-1}$. 
 Note that the estimator $\hat{\beta}$ is a linear map of $y_S$; as $y_S$ is a multidimensional normal r.v., so is $\hat{\beta}$ (the randomness coming from the noise terms $\varepsilon_i$). In particular, $\hat{\beta}$ has mean $\beta$ (\emph{i.e.}, it is an \emph{unbianced estimator}) and covariance $(\T{X_S}X_S)^{-1}$.
-
+ 
 Let $V:2^\mathcal{N}\to\reals$ be a value function, quantifying how informative a set of experiments $S$ is in estimating $\beta$. The standard optimal experimental design problem amounts to finding a set $S$ that maximizes $V(S)$ subject to the constraint $|S|\leq k$. 
 
-There is a variety of different value functions used in experimental design\cite{pukelsheim2006optimal}. Almost all capture this through some property the covariance $(\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. Due to its relationship to entropy, a most commonly used is the \emph{$D$-optimality criterion}: %which yields the following optimization problem
+There is a variety of different value functions used in experimental design~\cite{pukelsheim2006optimal}; almost all are related to the covariance $(\T{X_S}X_S)^{-1}$ of the estimator $\hat{\beta}$. Due to its relationship to entropy,  the \emph{$D$-optimality criterion} is commonly used: %which yields the following optimization problem
 \begin{align}
- V_D(S) &= \frac{1}{2}\log\det \T{X_S}X_S \label{dcrit} %\\
+ V(S) &= \frac{1}{2}\log\det \T{X_S}X_S \label{dcrit} %\\
 \end{align}
-As $\hat{\beta}$ is a multidimensional normal random variable, the $D$-optimality criterion is equal (up to a costant) to the negative of the entropy of $\hat{\beta}$. Hence, selecting a set of experiments $S$ that maximizes $V_D(S)$ is equivalent to finding the set of experiments that minimizes the uncertainty on $\beta$, as captured by the entropy of its estimator.
+As $\hat{\beta}$ is a multidimensional normal random variable, the $D$-optimality criterion is equal (up to a costant) to the negative of the entropy of $\hat{\beta}$. Hence,  maximizing \eqref{dcrit} amounts to finding the set of experiments that minimizes the uncertainty on $\beta$, as captured by the entropy of its estimator.
 
 %As discussed in the next section, in this paper, we work with a modified measure of information function, namely 
 %\begin{align}
@@ -33,7 +33,7 @@ As $\hat{\beta}$ is a multidimensional normal random variable, the $D$-optimalit
 \subsection{Budget Feasible Mechanism Design}
 In this paper, we approach the problem of optimal experimental design from the perspective of \emph{a budget feasible reverse auction} \cite{singer-mechanisms}. In particular, we assume that each experiment $i\in \mathcal{N}$ is associated with a cost $c_i$, that the experimenter needs to pay in order to conduct the experiment.  The experimenter has a budget $B\in\reals_+$. In the \emph{full information case}, the costs are common knowledge; optimal design in this context amounts to selecting a set $S$ maximizing the value $V(S)$ subject to the constraint $\sum_{i\in S} c_i\leq B$.
 
-As in \cite{singer-mechanisms,chen}, we focus in a \emph{strategic scenario}: experiment $i$ corresponds to a \emph{strategic agent}, whose cost $c_i$ is private.  For example, each $i$ may correspond to a human participant; the feature vector $x_i$ may correspond to a normalized vector of her age, weight, gender, income, \emph{etc.}, and the measurement $y_i$ may capture some biometric information (\emph{e.g.}, her red cell blood count, a genetic marker, etc.). The cost $c_i$ is the amount the participant deems sufficient to incentivize her participation in the study. 
+As in \cite{singer-mechanisms,chen}, we focus on a \emph{strategic scenario}: experiment $i$ corresponds to a \emph{strategic agent}, whose cost $c_i$ is private.  For example, each $i$ may correspond to a human participant; the feature vector $x_i$ may correspond to a normalized vector of her age, weight, gender, income, \emph{etc.}, and the measurement $y_i$ may capture some biometric information (\emph{e.g.}, her red cell blood count, a genetic marker, etc.). The cost $c_i$ is the amount the participant deems sufficient to incentivize her participation in the study. 
 
 
 A mechanism $\mathcal{M} = (f,p)$ comprises (a) an \emph{allocation function} $f:\reals_+^n \to 2^\mathcal{N}$ and (b) a \emph{payment function} $p:\reals_+^n\to \reals_+^n$. The allocation function determines the set $S\subset \mathcal{N}$ of experiments to be conducted. The payment function returns a vector of payments $[p_i]_{i\in \mathcal{N}}$. As in \cite{singer-mechanisms, chen}, we study mechanisms that are normalized ($i\notin S$ implies $p_i=0$), individually rational ($p_i\geq c_i$) and have no positive transfers $p_i\geq 0$.