VI part for Gaussian model - 1st draft

author: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-12-11 13:45:06 -0500
committer: jeanpouget-abadie <jean.pougetabadie@gmail.com> 2015-12-11 13:45:06 -0500
commit: ed72283f6d91ca0f86ae96edd4c37fde87d14f47 (patch)
tree: 61f40db67c1ca927ec4604e4b923d79d89a220f4 /finale/sections/bayesian.tex
parent: 8499eb52a13356f7ee11fafe6d24adcd05ed8bdd (diff)
download: cascades-ed72283f6d91ca0f86ae96edd4c37fde87d14f47.tar.gz
1 files changed, 22 insertions, 8 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex
index 3c70ebd..efc1526 100644
--- a/finale/sections/bayesian.tex
+++ b/finale/sections/bayesian.tex
@@ -52,8 +52,7 @@ posterior directly using the un-normalized posterior distribution. The advantage
 of this method is the ability to sample from the exact posterior and the wide
 availability of software packages which will work `out-of-the-box'. However,
 vanilla MCMC scales badly and is unsuitable for Bayesian learning of large
-networks ($\geq 100$ nodes). We resort to fitting approximate posterior
-distribution using a variational inference algorithm.
+networks ($\geq 100$ nodes).
 
 \paragraph{Variational Inference}
 
@@ -87,13 +86,15 @@ observed data, the variational inference approach allows us to process data in
 batches to provide an analytical approximation to the posterior, thus improving
 scalability.  In many cases, however, the expectation term cannot be found in
 closed-form, and approximation by sampling does not scale well with the number
-of parameters. We must often resort to linear or quadratic approximations of the
-log-likelihood to obtain an analytical expression.
+of parameters, but we can borrow ideas from Bohning~\cite{} to propose a
+linear/quadratic approximation to the log-likelihood for which the expectation
+term can be written analytically.
 
 \subsection{Example}
-
-As mentioned above, the IC model (cf. Section~\ref{sec:model}) has no conjugate
-priors. We consider here a truncated product gaussian prior here:
+We develop the Variational Inference algorithm for the IC model (cf.
+Section~\ref{sec:model}).  As mentioned above, the IC model with link function
+$f: z \mapsto 1 - e^{-z}$ has no conjugate priors. We consider here a truncated
+product gaussian prior:
 \begin{equation}
   \label{eq:gaussianprior}
   \text{prior}(\Theta) = \prod_{ij} \mathcal{N}^+(\theta_{ij} | \mu_{ij},
@@ -103,4 +104,17 @@ where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
 since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
 is represented in the graphical model of Figure~\ref{fig:graphical}.
 
-VI algorithm for Gaussian stuff
+The product-form of the prior implies that the KL term is entirely decomposable:
+\begin{equation}
+  \text{KL}(q_{\mathbf{\Theta'}}, p_{\mathbf{\Theta}}) = \sum_{ij}
+  KL\left(\mathcal{N}^+(\mu_{ij}, \sigma_{ij}), \mathcal{N}^+(\mu^0_{ij},
+  \sigma^0_{ij})\right)
+\end{equation}
+
+ince an easy closed-form formula exists for the KL divergence between two
+gaussians, we approximate the truncated gaussians by their non-truncated
+counterpart. \begin{equation}
+  \label{eq:kl}
+  \text{KL}(q_{\mathbf{\Theta'}}, p_{\mathbf{\Theta}}) \approx \sum_{i,j} \log
+  \frac{\sigma^0_{ij}}{\sigma_{ij}} + 
+\end{equation}
author	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-12-11 13:45:06 -0500
committer	jeanpouget-abadie <jean.pougetabadie@gmail.com>	2015-12-11 13:45:06 -0500
commit	ed72283f6d91ca0f86ae96edd4c37fde87d14f47 (patch)
tree	61f40db67c1ca927ec4604e4b923d79d89a220f4 /finale/sections/bayesian.tex
parent	8499eb52a13356f7ee11fafe6d24adcd05ed8bdd (diff)
download	cascades-ed72283f6d91ca0f86ae96edd4c37fde87d14f47.tar.gz