1 files changed, 33 insertions, 8 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex
index 5c7b179..3c70ebd 100644
--- a/finale/sections/bayesian.tex
+++ b/finale/sections/bayesian.tex
@@ -58,14 +58,37 @@ distribution using a variational inference algorithm.
 \paragraph{Variational Inference}
 
 Variational inference algorithms consist in fitting an approximate family of
-distributions to the exact posterior. The variational objective maximizes a
-lower bound on the log marginal likelihood:
-\begin{align*}
-  \mathcal{V}(\mathbf{\Theta}, \mathbf{\Phi}, \{\mathbf{x}_c\}) =  -
-  \text{KL}(q_{\mathbf{\Phi}}, p_{\mathbf{\Theta}}) + \sum_{c = 1}^C
-  \E_{q_{\mathbf{\Phi}}} \log \mathcal{L}(\mathbf{x}_c | \mathbf{\Theta})
-\end{align*}
-where $p_{\mathbf{\Theta}}$ is the prior distribution,
+distributions to the exact posterior. The variational objective can be
+decomposed as a sum between a divergence term with the prior and a likelihood
+term:
+\begin{equation}
+  \begin{split}
+    \mathcal{V}(\mathbf{\Theta}, \mathbf{\Theta'}, \{\mathbf{x}_c\}) =  &-
+    \text{KL}(q_{\mathbf{\Theta'}}, p_{\mathbf{\Theta}}) \\ &+ \sum_{c = 1}^C
+    \E_{q_{\mathbf{\Theta'}}} \log \mathcal{L}(\mathbf{x}_c | \mathbf{\Theta})
+  \end{split}
+\end{equation}
+
+where $p_{\mathbf{\Theta}}$ is the prior distribution, parametrized by prior
+parameters $\Theta = (\mathbf{\mu}^0 , \mathbf{\sigma}^0)$,
+$q_{\mathbf{\Theta'}}$ is the approximate posterior distribution, parametrized
+by variational parameters $\Theta' = (\mathbf{\mu}, \mathbf{\sigma})$,
+$\log \mathcal{L}(x | \Theta)$ is the log-likelihood as written in
+Eq.~\ref{eq:dist}, and $\text{KL}(p , q)$ is the Kullback-Leibler divergence
+between distributions $p$ and $q$. The variational objective maximizes a lower
+bound on the log marginal likelihood:
+\begin{equation}
+  \max_{\mathbf{\Theta'}} \mathcal{V}(\mathbf{\Theta}, \mathbf{\Theta'},
+  \{\mathbf{x}_c\}) \leq \log p_\Theta(\{ \mathbf{x}_c\})
+\end{equation}
+
+Contrary to MCMC which outputs samples from the exact posterior given all
+observed data, the variational inference approach allows us to process data in
+batches to provide an analytical approximation to the posterior, thus improving
+scalability.  In many cases, however, the expectation term cannot be found in
+closed-form, and approximation by sampling does not scale well with the number
+of parameters. We must often resort to linear or quadratic approximations of the
+log-likelihood to obtain an analytical expression.
 
 \subsection{Example}
 
@@ -79,3 +102,5 @@ priors. We consider here a truncated product gaussian prior here:
 where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
 since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
 is represented in the graphical model of Figure~\ref{fig:graphical}.
+
+VI algorithm for Gaussian stuff