1 files changed, 39 insertions, 23 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex
index 60272fe..5c7b179 100644
--- a/finale/sections/bayesian.tex
+++ b/finale/sections/bayesian.tex
@@ -2,12 +2,14 @@
 \centering
 \label{fig:graphical}
 \includegraphics[scale=.8]{graphical.pdf}
-\caption{Graphical model representation of the Network Inference Problem with
-  edge weights $\theta_{ij}$, observed cascade indicator vectors $X^c_t$, edge
-prior parameters $\mu_{ij}$ and $\sigma_{ij}$. The source distribution,
-parameterized by $\phi$, is considered fixed here.}
+\caption{Graphical model representation of the Network Inference Problem in the
+  case of a Gaussian product prior, with edge weights $\theta_{ij}$, observed
+cascade indicator vectors $X^c_t$, edge prior parameters $\mu_{ij}$ and
+$\sigma_{ij}$. The source distribution, parameterized by $\phi$, is considered
+fixed here.}
 \end{figure}
 
+\subsection{Advantages of the Bayesian Framework}
 In this section, we develop a Bayesian approach to the Network Inference Problem
 by placing priors on the edge weights of the graph. The quantity of interest is
 the posterior distribution, given through Bayes' rule by:
@@ -18,10 +20,13 @@ the posterior distribution, given through Bayes' rule by:
 where $\mathcal{L}_\Theta(\bx)$ is the likelihood expressed in
 Eq.~\ref{eq:dist}.
 
-One advantage of the Bayesian approach is its ability to convey information
-about the uncertainty surrounding each edge parameters. In the next section, we
-will explore how to exploit this knowledge to improve the rate at which we
-decrease our uncertainty by focusing on the most relevant parts of the network.
+One advantage of the Bayesian approach is its ability to convey distributional
+information about our belief for each parameter rather than the pointwise
+estimates accessible by MLE.~For example, exploring the entropy
+of the posterior on each parameter allows us to quantify how uncertain we are of
+each edge parameters' value.  In the next section, we will explore how to
+exploit this knowledge to improve the rate at which we decrease our uncertainty
+by focusing on the most relevant parts of the network.
 
 Another advantage of the Bayesian approach is the ability to encode
 domain-knowledge through well-chosen prior distributions. For example, there is
@@ -33,21 +38,13 @@ density of triangles has the potential to greatly increase the information we
 leverage from each cascade.  Of course, such priors no longer allow us to
 perform inference in parallel, which was leveraged in prior work.
 
-\paragraph{The IC model.}
-As mentioned above, the IC model (cf. Section~\ref{sec:model}) has no conjugate
-priors. We consider here a truncated product gaussian prior here:
-\begin{equation}
-  \label{eq:gaussianprior}
-  \text{prior}(\Theta) = \prod_{ij} \mathcal{N}^+(\theta_{ij} | \mu_{ij},
-  \sigma_{ij})
-\end{equation}
-where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
-since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
-is represented in the graphical model of Figure~\ref{fig:graphical}.
+\subsection{Inference}
 
-Since the prior in Eq.~\ref{eq:gaussianprior} is non-conjuate, we will
-resort to the use of sampling algorithms (MCMC) and approximate Bayesian methods
-(variational inference), which we cover here.
+Depending on the link function $f$, the GLC model may not possess conjugate
+priors (e.g.~the IC model). Even if conjugate priors exist, they may be
+restricted to product form. In these cases, we resort to the use of sampling
+algorithms (MCMC) and approximate Bayesian methods (variational inference),
+which we cover here.
 
 \paragraph{MCMC}
 The Metropolis-Hastings (MCMC) algorithm allows us to draw samples from the
@@ -61,5 +58,24 @@ distribution using a variational inference algorithm.
 \paragraph{Variational Inference}
 
 Variational inference algorithms consist in fitting an approximate family of
-distributions to the exact posterior.
+distributions to the exact posterior. The variational objective maximizes a
+lower bound on the log marginal likelihood:
+\begin{align*}
+  \mathcal{V}(\mathbf{\Theta}, \mathbf{\Phi}, \{\mathbf{x}_c\}) =  -
+  \text{KL}(q_{\mathbf{\Phi}}, p_{\mathbf{\Theta}}) + \sum_{c = 1}^C
+  \E_{q_{\mathbf{\Phi}}} \log \mathcal{L}(\mathbf{x}_c | \mathbf{\Theta})
+\end{align*}
+where $p_{\mathbf{\Theta}}$ is the prior distribution,
+
+\subsection{Example}
 
+As mentioned above, the IC model (cf. Section~\ref{sec:model}) has no conjugate
+priors. We consider here a truncated product gaussian prior here:
+\begin{equation}
+  \label{eq:gaussianprior}
+  \text{prior}(\Theta) = \prod_{ij} \mathcal{N}^+(\theta_{ij} | \mu_{ij},
+  \sigma_{ij})
+\end{equation}
+where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
+since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
+is represented in the graphical model of Figure~\ref{fig:graphical}.