1 files changed, 24 insertions, 23 deletions
diff --git a/finale/sections/bayesian.tex b/finale/sections/bayesian.tex
index dda5c81..5d95ddb 100644
--- a/finale/sections/bayesian.tex
+++ b/finale/sections/bayesian.tex
@@ -9,7 +9,6 @@ $\sigma_{ij}$. The source distribution, parameterized by $\phi$, is considered
 fixed here.}
 \end{figure}
 
-\subsection{Advantages of the Bayesian Framework}
 In this section, we develop a Bayesian approach to the Network Inference Problem
 by placing priors on the edge weights of the graph. The quantity of interest is
 the posterior distribution, given through Bayes' rule by:
@@ -20,41 +19,43 @@ the posterior distribution, given through Bayes' rule by:
 where $\mathcal{L}_\Theta(\bx)$ is the likelihood expressed in
 Eq.~\ref{eq:dist}.
 
+\subsection{Advantages for Graph inference}
+
 One advantage of the Bayesian approach is its ability to convey distributional
-information about our belief for each parameter rather than the pointwise
-estimates accessible by MLE.~For example, exploring the entropy
-of the posterior on each parameter allows us to quantify how uncertain we are of
-each edge parameters' value.  In the next section, we will explore how to
-exploit this knowledge to improve the rate at which we decrease our uncertainty
-by focusing on the most relevant parts of the network.
+information about our belief of each parameter rather than the pointwise
+estimates accessible by MLE.~For example, exploring the entropy of the
+posterior on each parameter allows us to quantify the uncertainty on edge
+weights. In the next section, we will exploit this information to improve the
+rate at which we decrease the uncertainty (and hence learn the network) by
+focusing on the most relevant parts of the network.
 
 Another advantage of the Bayesian approach is the ability to encode
 domain-knowledge through well-chosen prior distributions. For example, there is
 an extensive literature~\cite{} on parametric representations of social
-networks, which attempt to reproduce certain properties of such networks:
-density of triangles, diameter, degree distribution, clustering coefficient etc.
-Accounting for known graph properties, such as reciprocal links or the high
-density of triangles has the potential to greatly increase the information we
-leverage from each cascade.  Of course, such priors no longer allow us to
+networks, which attempt to reproduce observed properties of such networks:
+density of triangles, diameter, degree distribution, clustering coefficient
+etc.  Accounting for known graph properties, such as reciprocal links or the
+high density of triangles has the potential to greatly increase the information
+we extract from each cascade.  Of course, such priors no longer allow us to
 perform inference in parallel, which was leveraged in prior work.
 
 \subsection{Inference}
 
 Depending on the link function $f$, the GLC model may not possess conjugate
-priors (e.g.~the IC model). Even if conjugate priors exist, they may be
-restricted to product form. In these cases, we resort to the use of sampling
-algorithms (MCMC) and approximate Bayesian methods (variational inference),
-which we cover here.
+priors (this is for example the case in the IC model). Even if conjugate priors
+exist, they may be restricted to product form. In these cases, we resort to the
+use of sampling algorithms (MCMC) and approximate Bayesian methods (variational
+inference), which we cover here.
 
-\paragraph{MCMC}
+\paragraph{MCMC.}
 The Metropolis-Hastings (MCMC) algorithm allows us to draw samples from the
-posterior directly using the un-normalized posterior distribution. The advantage
-of this method is the ability to sample from the exact posterior and the wide
-availability of software packages which will work `out-of-the-box'. However,
-vanilla MCMC scales badly and is unsuitable for Bayesian learning of large
-networks ($\geq 100$ nodes).
+posterior directly using the un-normalized posterior distribution. The
+advantage of this method is the ability to sample from the exact posterior and
+the wide availability of software packages which will work `out-of-the-box'.
+However, as we show in our experiments, vanilla MCMC scales badly and is
+unsuitable for Bayesian learning of large networks ($\geq 100$ nodes).
 
-\paragraph{Variational Inference}
+\paragraph{Variational Inference.}
 
 Variational inference algorithms consist in fitting an approximate family of
 distributions to the exact posterior. The variational objective can be