summaryrefslogtreecommitdiffstats
path: root/general.tex
diff options
context:
space:
mode:
authorStratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>2012-11-04 19:59:20 -0800
committerStratis Ioannidis <stratis@stratis-Latitude-E6320.(none)>2012-11-04 19:59:20 -0800
commit35ff12aed97bcae04e89853fefa7443a03875bec (patch)
tree3ea4dded79000d1b32cbb6eb42aaddf9c1102206 /general.tex
parentc8865535a14a79581d3b9eb7c52cb853a831e180 (diff)
downloadrecommendation-35ff12aed97bcae04e89853fefa7443a03875bec.tar.gz
small stuff
Diffstat (limited to 'general.tex')
-rw-r--r--general.tex9
1 files changed, 3 insertions, 6 deletions
diff --git a/general.tex b/general.tex
index 550d6b7..6e56577 100644
--- a/general.tex
+++ b/general.tex
@@ -6,7 +6,7 @@ as our objective. Moreover, we extend Theorem~\ref{thm:main} to a more general
Bayesian setting.
In the Bayesian setting, it is assumed that the experimenter has a prior distribution on $\beta$: in particular, $\beta$ has a multivariate normal prior with zero mean and covariance $\sigma^2R\in \reals^{d^2}$ (where $\sigma^2$ is the noise variance).
-The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}: FIX!
+The experimenter estimates $\beta$ through \emph{maximum a posteriori estimation}: \emph{i.e.}, finding the parameter which maximizes the posterior distribution of $\beta$ given the observations $y_S$. Under the linearity assumption \eqref{model} and the Gaussian prior on $\beta$, maximum a posteriori estimation leads to the following maximization \cite{hastie}:
\begin{displaymath}
\hat{\beta} = \argmin_{\beta\in\reals^d} \sum_i (y_i - \T{\beta}x_i)^2
+ \sum_i \norm{R\beta}_2^2
@@ -73,13 +73,11 @@ In this setup, assume that the experimenter has a prior distribution on the hypo
V(S) = \entropy(\beta) -\entropy(\beta\mid y_S),\quad S\subseteq\mathcal{N}
\end{align}
This is a monotone set function, and it clearly satisfies $V(\emptyset)=0$. Though, in general, mutual information is not a submodular function, this specific setup leads indeed to a submodular formulation.
-
\begin{lemma}
The value function given by the information gain \eqref{general} is submodular.
\end{lemma}
-
\begin{proof}
-The theorem is proved in a slightly different context in \cite{krause2005near}; we
+The lemma is proved in a slightly different context in \cite{krause2005near}; we
repeat the proof here for the sake of completeness. Using the chain rule for
the conditional entropy we get:
\begin{equation}\label{eq:chain-rule}
@@ -89,8 +87,7 @@ the conditional entropy we get:
where the second equality comes from the independence of the $y_i$'s
conditioned on $\beta$. Recall that the joint entropy of a set of random
variables is a submodular function. Thus, our value function is written in
-\eqref{eq:chain-rule} as the sum of a submodular function and a modular function:
-it is submodular.
+\eqref{eq:chain-rule} as the sum of a submodular function and a modular function.
\end{proof}
This lemma that implies that learning an \emph{arbitrary hypothesis, under an