1 files changed, 3 insertions, 6 deletions
diff --git a/general.tex b/general.tex
index 550d6b7..6e56577 100644
--- a/general.tex
+++ b/general.tex
@@ -6,7 +6,7 @@ as our objective.  Moreover, we extend Theorem~\ref{thm:main} to a more general
 Bayesian setting.
 
 In the Bayesian setting, it is assumed that the experimenter has a prior distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior with zero mean  and covariance $\sigma^2R\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). 
-The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: FIX!
+The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: 
 \begin{displaymath}
     \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
     + \sum_i \norm{R\beta}_2^2
@@ -73,13 +73,11 @@ In this setup, assume that the experimenter has a prior distribution on the hypo
 V(S) = \entropy(\beta) -\entropy(\beta\mid y_S),\quad S\subseteq\mathcal{N} 
 \end{align}
 This is a monotone set function, and it clearly satisfies $V(\emptyset)=0$. Though, in general, mutual information is not a submodular function, this specific setup leads indeed to a submodular formulation.
-
 \begin{lemma}
 The value function given by the information gain \eqref{general} is submodular.
 \end{lemma}
-
 \begin{proof}
-The theorem is proved in a slightly different context in \cite{krause2005near}; we
+The lemma is proved in a slightly different context in \cite{krause2005near}; we
 repeat the proof here for the sake of completeness. Using the chain rule for
 the conditional entropy we get:
 \begin{equation}\label{eq:chain-rule}
@@ -89,8 +87,7 @@ the conditional entropy we get:
 where the second equality comes from the independence of the $y_i$'s
 conditioned on $\beta$. Recall that the joint entropy of a set of random
 variables is a submodular function. Thus, our value function is written in
-\eqref{eq:chain-rule} as the sum of a submodular function and a modular function:
-it is submodular.
+\eqref{eq:chain-rule} as the sum of a submodular function and a modular function. 
 \end{proof}
 
 This lemma that implies that learning an \emph{arbitrary hypothesis, under an