diff options
Diffstat (limited to 'general.tex')
| -rw-r--r-- | general.tex | 9 |
1 files changed, 3 insertions, 6 deletions
diff --git a/general.tex b/general.tex index 550d6b7..6e56577 100644 --- a/general.tex +++ b/general.tex @@ -6,7 +6,7 @@ as our objective. Moreover, we extend Theorem~\ref{thm:main} to a more general Bayesian setting. In the Bayesian setting, it is assumed that the experimenter has a prior distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior with zero mean and covariance $\sigma^2R\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance). -The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: FIX! +The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: \begin{displaymath} \hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2 + \sum_i \norm{R\beta}_2^2 @@ -73,13 +73,11 @@ In this setup, assume that the experimenter has a prior distribution on the hypo V(S) = \entropy(\beta) -\entropy(\beta\mid y_S),\quad S\subseteq\mathcal{N} \end{align} This is a monotone set function, and it clearly satisfies $V(\emptyset)=0$. Though, in general, mutual information is not a submodular function, this specific setup leads indeed to a submodular formulation. - \begin{lemma} The value function given by the information gain \eqref{general} is submodular. \end{lemma} - \begin{proof} -The theorem is proved in a slightly different context in \cite{krause2005near}; we +The lemma is proved in a slightly different context in \cite{krause2005near}; we repeat the proof here for the sake of completeness. Using the chain rule for the conditional entropy we get: \begin{equation}\label{eq:chain-rule} @@ -89,8 +87,7 @@ the conditional entropy we get: where the second equality comes from the independence of the $y_i$'s conditioned on $\beta$. Recall that the joint entropy of a set of random variables is a submodular function. Thus, our value function is written in -\eqref{eq:chain-rule} as the sum of a submodular function and a modular function: -it is submodular. +\eqref{eq:chain-rule} as the sum of a submodular function and a modular function. \end{proof} This lemma that implies that learning an \emph{arbitrary hypothesis, under an |
