aboutsummaryrefslogtreecommitdiffstats
path: root/finale/sections/bayesian.tex
diff options
context:
space:
mode:
authorjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-12-11 13:45:06 -0500
committerjeanpouget-abadie <jean.pougetabadie@gmail.com>2015-12-11 13:45:06 -0500
commited72283f6d91ca0f86ae96edd4c37fde87d14f47 (patch)
tree61f40db67c1ca927ec4604e4b923d79d89a220f4 /finale/sections/bayesian.tex
parent8499eb52a13356f7ee11fafe6d24adcd05ed8bdd (diff)
downloadcascades-ed72283f6d91ca0f86ae96edd4c37fde87d14f47.tar.gz
VI part for Gaussian model - 1st draft
Diffstat (limited to 'finale/sections/bayesian.tex')
-rw-r--r--finale/sections/bayesian.tex30
1 files changed, 22 insertions, 8 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex
index 3c70ebd..efc1526 100644
--- a/finale/sections/bayesian.tex
+++ b/finale/sections/bayesian.tex
@@ -52,8 +52,7 @@ posterior directly using the un-normalized posterior distribution. The advantage
of this method is the ability to sample from the exact posterior and the wide
availability of software packages which will work `out-of-the-box'. However,
vanilla MCMC scales badly and is unsuitable for Bayesian learning of large
-networks ($\geq 100$ nodes). We resort to fitting approximate posterior
-distribution using a variational inference algorithm.
+networks ($\geq 100$ nodes).
\paragraph{Variational Inference}
@@ -87,13 +86,15 @@ observed data, the variational inference approach allows us to process data in
batches to provide an analytical approximation to the posterior, thus improving
scalability. In many cases, however, the expectation term cannot be found in
closed-form, and approximation by sampling does not scale well with the number
-of parameters. We must often resort to linear or quadratic approximations of the
-log-likelihood to obtain an analytical expression.
+of parameters, but we can borrow ideas from Bohning~\cite{} to propose a
+linear/quadratic approximation to the log-likelihood for which the expectation
+term can be written analytically.
\subsection{Example}
-
-As mentioned above, the IC model (cf. Section~\ref{sec:model}) has no conjugate
-priors. We consider here a truncated product gaussian prior here:
+We develop the Variational Inference algorithm for the IC model (cf.
+Section~\ref{sec:model}). As mentioned above, the IC model with link function
+$f: z \mapsto 1 - e^{-z}$ has no conjugate priors. We consider here a truncated
+product gaussian prior:
\begin{equation}
\label{eq:gaussianprior}
\text{prior}(\Theta) = \prod_{ij} \mathcal{N}^+(\theta_{ij} | \mu_{ij},
@@ -103,4 +104,17 @@ where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
is represented in the graphical model of Figure~\ref{fig:graphical}.
-VI algorithm for Gaussian stuff
+The product-form of the prior implies that the KL term is entirely decomposable:
+\begin{equation}
+ \text{KL}(q_{\mathbf{\Theta'}}, p_{\mathbf{\Theta}}) = \sum_{ij}
+ KL\left(\mathcal{N}^+(\mu_{ij}, \sigma_{ij}), \mathcal{N}^+(\mu^0_{ij},
+ \sigma^0_{ij})\right)
+\end{equation}
+
+ince an easy closed-form formula exists for the KL divergence between two
+gaussians, we approximate the truncated gaussians by their non-truncated
+counterpart. \begin{equation}
+ \label{eq:kl}
+ \text{KL}(q_{\mathbf{\Theta'}}, p_{\mathbf{\Theta}}) \approx \sum_{i,j} \log
+ \frac{\sigma^0_{ij}}{\sigma_{ij}} +
+\end{equation}