finale/sections/experiments.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88

In this section, we apply the framework from Section~\ref{sec:bayes}
and~\ref{sec:active} on synthetic graphs and cascades to validate the Bayesian
approach as well as the effectiveness of the Active Learning heuristics.

We started by using the PyMC library \cite{pymc} which implements MCMC
algorithms to sample from the posterior distribution directly. We found this
method to scale poorly with the number of nodes in the graph, such that graphs
of size $\geq 100$ could not be reasonably be learned in the amount of time we
had. In the appendix, we show the progressive convergence of the posterior
around the true values of the edge weights of the graph for a graph of size
$4$.

\paragraph{VI implementation.} In order to show the effect of the active
learning policies, we needed to scale the experiments to graphs of size $\geq
1000$. A graph of size $1000$ has $1M$ parameters to be learned ($2M$ in the
product-prior of Eq.~\ref{eq:gaussianprior}). The maximum-likelihood estimator
converges to an $\ell_\infty$-error of $.05$ for most graphs after having
observed at least $100M$ distinct cascade-steps. Given the number of parameters
to be learned and the number of cascades to process, we had to resort to
Variational Inference.

We therefore used the Blocks~\cite{blocks} framework, written on top of Theano
for automatic differentiation and highly efficient SGD routines with minimal
memory overhead. By encoding a cascade step as a tuple of two binary vectors,
one for the infected nodes and one for the susceptible nodes, the variational
inference objective can be written as a simple computation graph on matrices
that Theano compiles to machine code optimized for GPU execution.

The code for our Variational Inference as well as MCMC implementations can be
found at \url{http://thibaut.horel.org/cs281_project.zip}.

\paragraph{Synthetic graph.} Since intuitively, if nodes are exchangeable
(\emph{i.e} their topological properties like degree are identical) in our
graph, the active learning policy will make little difference compared to the
uniform-source policy, we decided to test our algorithms on an unbalanced graph
$\mathcal{G}_A$ whose adjacency matrix $A$ is as follows:
\begin{equation*}
A = \left( \begin{array}{cccccc}
0 & 1 & 1 & 1 & \dots & 1 \\
0 & 0 & 1 & 0 & \dots & 0 \\
0 & 0 & 0 & 1 & \dots & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\
0 & 1 & 0 &  0 & \dots & 0
\end{array}
\right)
\end{equation*}
In other words, graph $\mathcal{G}_A$ is a wheel graph where every node, except
for the center node, points to its (clock-wise) neighbor. Even though the wheel
graph might not seem realistic, it is known that social networks (as well as
graphs found in many other real-life applications) exhibit a power-law degree
distribution which is also highly unbalanced.

\paragraph{Baselines.} Our experiments involved comparing the active learning
described in Section 4 to two baselines. For easier comparison, both baselines
are also online learning methods: cascades are generated on the fly and only
used once by the SGD algorithm to update the approximate posterior parameters.
The two baselines are:
\begin{itemize}
    \item \emph{uniform source:} for each cascade the source node is chosen
        uniformly at random among all vertices. Intuitively, this method should
        behaves similarly to offline learning.
    \item \emph{out-degree:} the source node is selected with
        probability proportional to the estimated out-degree (\emph{i.e} the
        sum of the estimated means on the outgoing edges).
\end{itemize}

Our active learning method, henceforth called \emph{out-variance} is then
compared to the two baselines by plotting the RMSE of the estimated graph,
\emph{i.e} $\text{RMSE}^2 = \frac{1}{n^2} \|\hat{\mathbf{\Theta}}
- \mathbf{\Theta}\|^2_2$, as a function of the observed number of cascades. The
results are presented in \cref{fig:results}. As we can see, both
\emph{out-degree} and \emph{out-variance} which are active in nature vastly
outperform \emph{uniform source} which is passive. Surprisingly,
\emph{out-degree} which is seemingly very crude since it does not take into
account the current uncertainty performs comparably to \emph{out-variance}. We
believe that this is due to the specific nature of the wheel graph: for this
graph \emph{out-degree} repeatedly selects the central node which is adjacent
to 50\% of the graph edges.

\begin{figure}
\centering
\includegraphics[scale=.4]{finale.png}
\caption{RMSE of the proposed active learning heuristic compared to the
baselines. The $x$ axis shows the number of batches: each batch contains 100
time steps of the cascade process.}
\label{fig:results}
\end{figure}