\begin{figure} \centering \label{fig:graphical} \includegraphics[scale=.8]{graphical.pdf} \caption{Graphical model representation of the Network Inference Problem with edge weights $\theta_{ij}$, cascade indicator vectors $X^c_t$, edge prior parameters $\mu$ and $\sigma$. The source distribution, parameterized by $\phi$, is considered fixed here.} \end{figure} In this section, we develop a Bayesian approach to the Network Inference Problem by placing priors on the edge weights of the graph. The quantity of interest is the posterior distribution, given through Bayes' rule by: \begin{equation} \label{eq:bayesrule} \Theta | \bx \propto \text{prior}(\Theta) \times \mathcal{L}_\Theta(\bx) \end{equation} where $\mathcal{L}_\Theta(\bx)$ is the likelihood expressed in Eq.~\ref{eq:dist}. One advantage of the Bayesian approach is its ability to convey information about the uncertainty surrounding each edge parameters. In the next section, we will explore how to exploit this knowledge to improve the rate at which we decrease our uncertainty by focusing on the most relevant parts of the network. Another advantage of the Bayesian approach is the ability to encode domain-knowledge through well-chosen prior distributions. For example, there is an extensive literature~\cite{} on parametric representations of social networks, which attempt to reproduce certain properties of such networks: density of triangles, diameter, degree distribution, clustering coefficient etc. Accounting for known graph properties, such as reciprocal links or the high density of triangles has the potential to greatly increase the information we leverage from each cascade. Of course, such priors no longer allow us to perform inference in parallel, which was leveraged in prior work. A systematic study of non-product priors is left for future work. We focus on product priors in the case of the IC model presented in Section~\ref{sec:model}, which has no conjugate priors: \begin{equation} \label{eq:gaussianprior} \text{prior}(\Theta) = \prod_{ij} \mathcal{N}^+(\theta_{ij} | \mu_{ij}, \sigma_{ij}) \end{equation} where $\mathcal{N}^+(\cdot)$ is a gaussian truncated to lied on $\mathbb{R}^+$ since $\Theta$ is a transformed parameter $z \mapsto -\log(1 - z)$. This model is represented in the graphical model of Figure~\ref{fig:graphical} Since the IC model likelihood has no conjugate family, the prior in Eq.~\ref{eq:gaussianprior} is also non-conjuate. We will resort to sampling algorithms (MCMC) and approximate Bayesian methods (variational inference), which we cover here. \paragraph{MCMC} The Metropolis-Hastings (MCMC) algorithm allows us to draw samples from the posterior directly using the un-normalized posterior distribution. The advantage of this method is the ability to sample from the exact posterior and the wide availability of software packages which will work `out-of-the-box'. However, vanilla MCMC scales badly and is unsuitable for Bayesian learning of large networks ($\geq 100$ nodes). We resort to fitting approximate posterior distribution using a variational inference algorithm. \paragraph{Variational Inference} \paragraph{Bohning bounds}