In this section, we apply the framework from Section~\ref{sec:bayes} and~\ref{sec:active} on synthetic graphs and cascades to validate the Bayesian approach as well as the effectiveness of the Active Learning heuristics. We started with using the library PyMC to sample from the posterior distribution directly. This method was shown to scale poorly with the number of nodes in the graph, such that graphs of size $\geq 100$ could not be reasonably be learned quickly. In Section~\ref{sec:appendix}, we show the progressive convergence of the posterior around the true values of the edge weights of the graph for a graph of size $4$. In order to show the effect of the active learning policies, we needed to scale the experiments to graphs of size $\geq 1000$, which required the use of the variational inference procedure. A graph of size $1000$ has $1M$ parameters to be learned ($2M$ in the product-prior in Eq.~\ref{eq:gaussianprior}). The maximum-likelihood estimator converges to an $l_\infty$-error of $.05$ for most graphs after having observed at least $100M$ distinct cascade-steps. We therefore used the Blocks~\cite{blocks} framework, written on top of Theano for highly efficient SGD routines with minimal memory overhead. By encoding a cascade step as a tuple of two binary vectors, one for the infected nodes and one for the susceptible nodes, the variational inference objective can be written as a sum of two matrix multiplications, which Theano optimizes for on GPU. Since intuitively if nodes are exchangeable in our graph, the active learning policy will have little impact over the uniform-source policy, we decided to test our algorithms on an unbalanced graph $\mathcal{G}_A$ whose adjacency matrix $A$ is as follows: \begin{equation*} A = \left( \begin{array}{cccccc} 0 & 1 & 1 & 1 & \dots & 1 \\ 0 & 0 & 1 & 0 & \dots & 0 \\ 0 & 0 & 0 & 1 & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & 1 & 0 & 0 & \dots & 0 \end{array} \right) \end{equation*} In other words, graph $\mathcal{G}_A$ is a star graph where every node, except for the center node, points to its (clock-wise) neighbor. In order for the baseline to be fair, we choose to create cascades starting from the source node on the fly both in the case of the uniform source and for the active learning policy. Each cascade is therefore `observed' only once. We plot the RMSE of the graph i.e. $RMSE^2 = \frac{1}{n^2} \|\hat{\mathbf{\Theta}} - \mathbf{\Theta}\|^2_2$. graphs/datasets bullshit