\begin{figure}
\centering
\label{fig:graphical}
\includegraphics[scale=.8]{graphical.pdf}
\caption{Graphical model representation of the Network Inference Problem with
  edge weights $\theta_{ij}$, cascade indicator vectors $X^c_t$, edge prior
parameters $\mu$ and $\sigma$. The source distribution, parameterized by $\phi$,
is considered fixed here.}
\end{figure}

In this section, we develop a Bayesian approach to the Network Inference Problem
by placing priors on the edge weights of the graph. The quantity of interest is
the posterior distribution, given through Bayes' rule by:
\begin{equation}
  \label{eq:bayesrule}
  \Theta | \bx \propto \text{prior}(\Theta) \times \mathcal{L}_\Theta(\bx)
\end{equation}
where $\mathcal{L}_\Theta(\bx)$ is the likelihood expressed in
Eq.~\ref{eq:dist}.

One advantage of the Bayesian approach is its ability to convey information
about the uncertainty surrounding each edge parameters. In the next section, we
will explore how to exploit this knowledge to improve the rate at which we
decrease our uncertainty by focusing on the most relevant parts of the network.

Another advantage of the Bayesian approach is the ability to encode
domain-knowledge through well-chosen prior distributions. For example, there is
an extensive literature~\cite{} on parametric representations of social
networks, which attempt to reproduce certain properties of such networks:
density of triangles, diameter, degree distribution, clustering coefficient etc.
Accounting for known graph properties, such as reciprocal links or the high
density of triangles has the potential to greatly increase the information we
leverage from each cascade.  Of course, such priors no longer allow us to
perform inference in parallel, which was leveraged in prior work.

A systematic study of non-product priors is left for future work. We focus on
product priors in the case of the IC model presented in Section~\ref{sec:model},
which has no conjugate priors:
\begin{equation}
  \label{eq:gaussianprior}
  \text{prior}(\Theta) = \prod_{ij} \mathcal{N}^+(\theta_{ij} | \mu_{ij},
  \sigma_{ij})
\end{equation}
where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$
since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model
is represented in the graphical model of Figure~\ref{fig:graphical}

Since the IC model likelihood has no conjugate family, the prior in
Eq.~\ref{eq:gaussianprior} is also non-conjuate. We will resort to sampling
algorithms (MCMC) and approximate Bayesian methods (variational inference),
which we cover here.

\paragraph{MCMC}
The Metropolis-Hastings (MCMC) algorithm allows us to draw samples from the
posterior directly using the un-normalized posterior distribution. The advantage
of this method is the ability to sample from the exact posterior and the wide
availability of software packages which will work `out-of-the-box'. However,
vanilla MCMC scales badly and is unsuitable for Bayesian learning of large
networks ($\geq 100$ nodes). We resort to fitting approximate posterior
distribution using a variational inference algorithm.

\paragraph{Variational Inference}

\paragraph{Bohning bounds}