diff options
Diffstat (limited to 'finale')
| -rw-r--r-- | finale/sections/bayesian.tex | 41 |
1 files changed, 33 insertions, 8 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex index 5c7b179..3c70ebd 100644 --- a/finale/sections/bayesian.tex +++ b/finale/sections/bayesian.tex @@ -58,14 +58,37 @@ distribution using a variational inference algorithm. \paragraph{Variational Inference} Variational inference algorithms consist in fitting an approximate family of -distributions to the exact posterior. The variational objective maximizes a -lower bound on the log marginal likelihood: -\begin{align*} - \mathcal{V}(\mathbf{\Theta}, \mathbf{\Phi}, \{\mathbf{x}_c\}) = - - \text{KL}(q_{\mathbf{\Phi}}, p_{\mathbf{\Theta}}) + \sum_{c = 1}^C - \E_{q_{\mathbf{\Phi}}} \log \mathcal{L}(\mathbf{x}_c | \mathbf{\Theta}) -\end{align*} -where $p_{\mathbf{\Theta}}$ is the prior distribution, +distributions to the exact posterior. The variational objective can be +decomposed as a sum between a divergence term with the prior and a likelihood +term: +\begin{equation} + \begin{split} + \mathcal{V}(\mathbf{\Theta}, \mathbf{\Theta'}, \{\mathbf{x}_c\}) = &- + \text{KL}(q_{\mathbf{\Theta'}}, p_{\mathbf{\Theta}}) \\ &+ \sum_{c = 1}^C + \E_{q_{\mathbf{\Theta'}}} \log \mathcal{L}(\mathbf{x}_c | \mathbf{\Theta}) + \end{split} +\end{equation} + +where $p_{\mathbf{\Theta}}$ is the prior distribution, parametrized by prior +parameters $\Theta = (\mathbf{\mu}^0 , \mathbf{\sigma}^0)$, +$q_{\mathbf{\Theta'}}$ is the approximate posterior distribution, parametrized +by variational parameters $\Theta' = (\mathbf{\mu}, \mathbf{\sigma})$, +$\log \mathcal{L}(x | \Theta)$ is the log-likelihood as written in +Eq.~\ref{eq:dist}, and $\text{KL}(p , q)$ is the Kullback-Leibler divergence +between distributions $p$ and $q$. The variational objective maximizes a lower +bound on the log marginal likelihood: +\begin{equation} + \max_{\mathbf{\Theta'}} \mathcal{V}(\mathbf{\Theta}, \mathbf{\Theta'}, + \{\mathbf{x}_c\}) \leq \log p_\Theta(\{ \mathbf{x}_c\}) +\end{equation} + +Contrary to MCMC which outputs samples from the exact posterior given all +observed data, the variational inference approach allows us to process data in +batches to provide an analytical approximation to the posterior, thus improving +scalability. In many cases, however, the expectation term cannot be found in +closed-form, and approximation by sampling does not scale well with the number +of parameters. We must often resort to linear or quadratic approximations of the +log-likelihood to obtain an analytical expression. \subsection{Example} @@ -79,3 +102,5 @@ priors. We consider here a truncated product gaussian prior here: where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$ since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model is represented in the graphical model of Figure~\ref{fig:graphical}. + +VI algorithm for Gaussian stuff |
