aboutsummaryrefslogtreecommitdiffstats
path: root/finale/sections
diff options
context:
space:
mode:
authorjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-12-11 13:16:31 -0500
committerjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-12-11 13:16:31 -0500
commit8499eb52a13356f7ee11fafe6d24adcd05ed8bdd (patch)
tree123894f00bc63a826da371573ba078ad1f9c572a /finale/sections
parent8fc52b33eecef1b55c5c1cbdead9217bf31dc3f8 (diff)
downloadcascades-8499eb52a13356f7ee11fafe6d24adcd05ed8bdd.tar.gz
VI paragraph
Diffstat (limited to 'finale/sections')
-rw-r--r--finale/sections/bayesian.tex41
1 files changed, 33 insertions, 8 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex
index 5c7b179..3c70ebd 100644
--- a/finale/sections/bayesian.tex
+++ b/finale/sections/bayesian.tex
@@ -58,14 +58,37 @@ distribution using a variational inference algorithm.
\paragraph{Variational Inference}
Variational inference algorithms consist in fitting an approximate family of
-distributions to the exact posterior. The variational objective maximizes a
-lower bound on the log marginal likelihood:
-\begin{align*}
- \mathcal{V}(\mathbf{\Theta}, \mathbf{\Phi}, \{\mathbf{x}_c\}) = -
- \text{KL}(q_{\mathbf{\Phi}}, p_{\mathbf{\Theta}}) + \sum_{c = 1}^C
- \E_{q_{\mathbf{\Phi}}} \log \mathcal{L}(\mathbf{x}_c | \mathbf{\Theta})
-\end{align*}
-where $p_{\mathbf{\Theta}}$ is the prior distribution,
+distributions to the exact posterior. The variational objective can be
+decomposed as a sum between a divergence term with the prior and a likelihood
+term:
+\begin{equation}
+ \begin{split}
+ \mathcal{V}(\mathbf{\Theta}, \mathbf{\Theta'}, \{\mathbf{x}_c\}) = &-
+ \text{KL}(q_{\mathbf{\Theta'}}, p_{\mathbf{\Theta}}) \\ &+ \sum_{c = 1}^C
+ \E_{q_{\mathbf{\Theta'}}} \log \mathcal{L}(\mathbf{x}_c | \mathbf{\Theta})
+ \end{split}
+\end{equation}
+
+where $p_{\mathbf{\Theta}}$ is the prior distribution, parametrized by prior
+parameters $\Theta = (\mathbf{\mu}^0 , \mathbf{\sigma}^0)$,
+$q_{\mathbf{\Theta'}}$ is the approximate posterior distribution, parametrized
+by variational parameters $\Theta' = (\mathbf{\mu}, \mathbf{\sigma})$,
+$\log \mathcal{L}(x | \Theta)$ is the log-likelihood as written in
+Eq.~\ref{eq:dist}, and $\text{KL}(p , q)$ is the Kullback-Leibler divergence
+between distributions $p$ and $q$. The variational objective maximizes a lower
+bound on the log marginal likelihood:
+\begin{equation}
+ \max_{\mathbf{\Theta'}} \mathcal{V}(\mathbf{\Theta}, \mathbf{\Theta'},
+ \{\mathbf{x}_c\}) \leq \log p_\Theta(\{ \mathbf{x}_c\})
+\end{equation}
+
+Contrary to MCMC which outputs samples from the exact posterior given all
+observed data, the variational inference approach allows us to process data in
+batches to provide an analytical approximation to the posterior, thus improving
+scalability. In many cases, however, the expectation term cannot be found in
+closed-form, and approximation by sampling does not scale well with the number
+of parameters. We must often resort to linear or quadratic approximations of the
+log-likelihood to obtain an analytical expression.
\subsection{Example}
@@ -79,3 +102,5 @@ priors. We consider here a truncated product gaussian prior here:
where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
is represented in the graphical model of Figure~\ref{fig:graphical}.
+
+VI algorithm for Gaussian stuff